Prevent full table scan for query with multiple where clauses
- by Dave Jarvis
A while ago I posted a message about optimizing a query in MySQL. I have since ported the data and query to PostgreSQL, but now PostgreSQL has the same problem. The solution in MySQL was to force the optimizer to not optimize using STRAIGHT_JOIN. PostgreSQL offers no such option.
Here is the explain:
Here is the query:
SELECT
avg(d.amount) AS amount,
y.year
FROM
station s,
station_district sd,
year_ref y,
month_ref m,
daily d
LEFT JOIN city c ON c.id = 10663
WHERE
-- Find all the stations within a specific unit radius ...
--
6371.009 *
SQRT(
POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
(COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 AND
-- Ignore stations outside the given elevations
--
s.elevation BETWEEN 0 AND 2000 AND
sd.id = s.station_district_id AND
-- Gather all known years for that station ...
--
y.station_district_id = sd.id AND
-- The data before 1900 is shaky; insufficient after 2009.
--
y.year BETWEEN 1980 AND 2000 AND
-- Filtered by all known months ...
--
m.year_ref_id = y.id AND
m.month = 12 AND
-- Whittled down by category ...
--
m.category_id = '001' AND
-- Into the valid daily climate data.
--
m.id = d.month_ref_id AND
d.daily_flag_id <> 'M'
GROUP BY
y.year
It appears as though PostgreSQL is looking at the DAILY table first, which is simply not the right way to go about this query as there are nearly 300 million rows.
How do I force PostgreSQL to start at the CITY table?
Thank you!