Remove redundant SQL code

Posted by Dave Jarvis on Stack Overflow See other posts from Stack Overflow or by Dave Jarvis
Published on 2010-05-10T00:43:35Z Indexed on 2010/05/10 0:48 UTC
Read the original article Hit count: 274

Filed under:
|
|

Code

The following code calculates the slope and intercept for a linear regression against a slathering of data. It then applies the equation y = mx + b against the same result set to calculate the value of the regression line for each row.

Can the two separate sub-selects be joined so that the data and its slope/intercept are calculated without executing the data gathering part of the query twice?

SELECT                                                                          
  AVG(D.AMOUNT) as AMOUNT,                                                      
  Y.YEAR * ymxb.SLOPE + ymxb.INTERCEPT as REGRESSION_LINE,                      
  Y.YEAR as YEAR,                                                               
  MAKEDATE(Y.YEAR,1) as AMOUNT_DATE                                             
FROM                                                                            
  CITY C,                                                                       
  STATION S,                                                                    
  YEAR_REF Y,                                                                   
  MONTH_REF M,                                                                  
  DAILY D,                                                                      
  (SELECT                                                                       
    ((avg(t.AMOUNT * t.YEAR)) - avg(t.AMOUNT) * avg(t.YEAR)) /                  
    (stddev( t.AMOUNT ) * stddev( t.YEAR )) as CORRELATION,                     

    ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) /
    (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE,   

    ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) -
    (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / 
    (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT
  FROM (                                                                   
    SELECT
      AVG(D.AMOUNT) as AMOUNT,
      Y.YEAR as YEAR,
      MAKEDATE(Y.YEAR,1) as AMOUNT_DATE
    FROM
      CITY C,
      STATION S,
      YEAR_REF Y,
      MONTH_REF M,
      DAILY D
    WHERE
      $X{ IN, C.ID, CityCode } AND
      SQRT(
        POW( C.LATITUDE - S.LATITUDE, 2 ) +
        POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) < $P{Radius} AND
      S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND
      Y.YEAR BETWEEN 1900 AND 2009 AND
      M.YEAR_REF_ID = Y.ID AND
      M.CATEGORY_ID = $P{CategoryCode} AND
      M.ID = D.MONTH_REF_ID AND
      D.DAILY_FLAG_ID <> 'M'
    GROUP BY
      Y.YEAR
  ) t
) ymxb
WHERE
  $X{ IN, C.ID, CityCode } AND
  SQRT(
    POW( C.LATITUDE - S.LATITUDE, 2 ) +
    POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) < $P{Radius} AND
  S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND
  Y.YEAR BETWEEN 1900 AND 2009 AND
  M.YEAR_REF_ID = Y.ID AND
  M.CATEGORY_ID = $P{CategoryCode} AND
  M.ID = D.MONTH_REF_ID AND
  D.DAILY_FLAG_ID <> 'M'
GROUP BY
  Y.YEAR

Question

How do I execute the duplicate bits only once per query, instead of twice? The duplicate bit is the WHERE clause:

  $X{ IN, C.ID, CityCode } AND
  SQRT(
    POW( C.LATITUDE - S.LATITUDE, 2 ) +
    POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) < $P{Radius} AND
  S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND
  Y.YEAR BETWEEN 1900 AND 2009 AND
  M.YEAR_REF_ID = Y.ID AND
  M.CATEGORY_ID = $P{CategoryCode} AND
  M.ID = D.MONTH_REF_ID AND
  D.DAILY_FLAG_ID <> 'M'

Related

http://stackoverflow.com/questions/1595659/how-to-eliminate-duplicate-calculation-in-sql

Thank you!

© Stack Overflow or respective owner

Related posts about mysql

Related posts about sql