Multiple outliers for two variable linear regression

Posted by Dave Jarvis on Stack Overflow See other posts from Stack Overflow or by Dave Jarvis
Published on 2010-05-09T22:15:14Z Indexed on 2010/05/09 22:18 UTC
Read the original article Hit count: 395

Filed under:
|
|
|
|

Problem

Building on my previous question, the "extreme" outliers in the following graph are somewhat obvious:

Graph

Question

Given:

  • T - Set of all temperatures
  • Y - Set of all years
  • ST - Sum of temperatures.
  • SY - Sum of years.
  • N - Number of elements
  • T(n) - Temperature of the nth element in the temperature set

How would you implement an efficient MySQL stored procedure or user-defined function (UDF) to determine if T(n) is an outlier? (If such an implementation already exists, that would be good to know as well.)

Related Sites

I am slowly working through these sites to get a better understanding of the problem:

Many thanks!

© Stack Overflow or respective owner

Related posts about mysql

Related posts about math