Multiple outliers for two variable linear regression
- by Dave Jarvis
Problem
Building on my previous question, the "extreme" outliers in the following graph are somewhat obvious:
Question
Given:
T - Set of all temperatures
Y - Set of all years
ST - Sum of temperatures.
SY - Sum of years.
N - Number of elements
T(n) - Temperature of the nth element in the temperature set
How would you implement an efficient MySQL stored procedure or user-defined function (UDF) to determine if T(n) is an outlier? (If such an implementation already exists, that would be good to know as well.)
Related Sites
I am slowly working through these sites to get a better understanding of the problem:
Multiple Outliers Detection Procedures in Linear Regression
M-estimator
Measure of Surprise for Outlier Detection
Ordinary Least Squares Linear Regression
Many thanks!