Prevent full table scan for query with multiple where clauses

Posted by Dave Jarvis on Stack Overflow See other posts from Stack Overflow or by Dave Jarvis
Published on 2010-05-15T00:41:24Z Indexed on 2010/05/15 0:44 UTC
Read the original article Hit count: 561

A while ago I posted a message about optimizing a query in MySQL. I have since ported the data and query to PostgreSQL, but now PostgreSQL has the same problem. The solution in MySQL was to force the optimizer to not optimize using STRAIGHT_JOIN. PostgreSQL offers no such option.

Here is the explain:

Here is the query:

SELECT
  avg(d.amount) AS amount,                                
  y.year                                                  
FROM                                                      
  station s,                                              
  station_district sd,                                    
  year_ref y,                                             
  month_ref m,                                            
  daily d                                                 
LEFT JOIN city c ON c.id = 10663
WHERE                                                     
  -- Find all the stations within a specific unit radius ...
  --                                                        
  6371.009 *                                                
  SQRT(                                                     
    POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
    (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
     POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 AND

  -- Ignore stations outside the given elevations
  --                                             
  s.elevation BETWEEN 0 AND 2000 AND             

  sd.id = s.station_district_id AND

  -- Gather all known years for that station ...
  --                                            
  y.station_district_id = sd.id AND             

  -- The data before 1900 is shaky; insufficient after 2009.
  --                                                        
  y.year BETWEEN 1980 AND 2000 AND                          

  -- Filtered by all known months ...
  --                                 
  m.year_ref_id = y.id AND

  m.month = 12 AND

  -- Whittled down by category ...
  --                              
  m.category_id = '001' AND       

  -- Into the valid daily climate data.
  --                                   
  m.id = d.month_ref_id AND            
  d.daily_flag_id <> 'M'               
GROUP BY                               
  y.year                               

It appears as though PostgreSQL is looking at the DAILY table first, which is simply not the right way to go about this query as there are nearly 300 million rows.

How do I force PostgreSQL to start at the CITY table?

Thank you!

© Stack Overflow or respective owner

Related posts about postgresql

Related posts about query-optimization