Linear regression confidence intervals in SQL
- by Matt Howells
I'm using some fairly straight-forward SQL code to calculate the coefficients of regression (intercept and slope) of some (x,y) data points, using least-squares. This gives me a nice best-fit line through the data. However we would like to be able to see the 95% and 5% confidence intervals for the line of best-fit (the curves below).
What these mean is that the true line has 95% probability of being below the upper curve and 95% probability of being above the lower curve. How can I calculate these curves? I have already read wikipedia etc. and done some googling but I haven't found understandable mathematical equations to be able to calculate this.
Edit: here is the essence of what I have right now.
--sample data
create table #lr (x real not null, y real not null)
insert into #lr values (0,1)
insert into #lr values (4,9)
insert into #lr values (2,5)
insert into #lr values (3,7)
declare @slope real
declare @intercept real
--calculate slope and intercept
select
@slope = ((count(*) * sum(x*y)) - (sum(x)*sum(y)))/
((count(*) * sum(Power(x,2)))-Power(Sum(x),2)),
@intercept = avg(y) - ((count(*) * sum(x*y)) - (sum(x)*sum(y)))/
((count(*) * sum(Power(x,2)))-Power(Sum(x),2)) * avg(x)
from #lr
Thank you in advance.