Multiple outliers for two variable linear regression
Posted
by Dave Jarvis
on Stack Overflow
See other posts from Stack Overflow
or by Dave Jarvis
Published on 2010-05-09T22:15:14Z
Indexed on
2010/05/09
22:18 UTC
Read the original article
Hit count: 395
Problem
Building on my previous question, the "extreme" outliers in the following graph are somewhat obvious:
Question
Given:
- T - Set of all temperatures
- Y - Set of all years
- ST - Sum of temperatures.
- SY - Sum of years.
- N - Number of elements
- T(n) - Temperature of the nth element in the temperature set
How would you implement an efficient MySQL stored procedure or user-defined function (UDF) to determine if T(n) is an outlier? (If such an implementation already exists, that would be good to know as well.)
Related Sites
I am slowly working through these sites to get a better understanding of the problem:
- Multiple Outliers Detection Procedures in Linear Regression
- M-estimator
- Measure of Surprise for Outlier Detection
- Ordinary Least Squares Linear Regression
Many thanks!
© Stack Overflow or respective owner