Efficient Multiple Linear Regression in C# / .Net

Posted by mrnye on Stack Overflow See other posts from Stack Overflow or by mrnye
Published on 2010-05-26T05:50:12Z Indexed on 2010/05/27 4:51 UTC
Read the original article Hit count: 665

Filed under:
|
|

Does anyone know of an efficient way to do multiple linear regression in C#, where the number of simultaneous equations may be in the 1000's (with 3 or 4 different inputs). After reading this article on multiple linear regression I tried implementing it with a matrix equation:

Matrix y = new Matrix(
    new double[,]{{745},
                  {895},
                  {442},
                  {440},
                  {1598}});

Matrix x = new Matrix(
     new double[,]{{1, 36, 66},
                 {1, 37, 68},
                 {1, 47, 64},
                 {1, 32, 53},
                 {1, 1, 101}});

Matrix b = (x.Transpose() * x).Inverse() * x.Transpose() * y;

for (int i = 0; i < b.Rows; i++)
{
  Trace.WriteLine("INFO: " + b[i, 0].ToDouble());
}

However it does not scale well to the scale of 1000's of equations due to the matrix inversion operation. I can call the R language and use that, however I was hoping there would be a pure .Net solution which will scale to these large sets.

Any suggestions?

EDIT #1:

I have settled using R for the time being. By using statconn (downloaded here) I have found it to be both fast & relatively easy to use this method. I.e. here is a small code snippet, it really isn't much code at all to use the R statconn library (note: this is not all the code!).

_StatConn.EvaluateNoReturn(string.Format("output <- lm({0})", equation));
object intercept = _StatConn.Evaluate("coefficients(output)['(Intercept)']");
parameters[0] = (double)intercept;
for (int i = 0; i < xColCount; i++)
{
  object parameter = _StatConn.Evaluate(string.Format("coefficients(output)['x{0}']", i));
  parameters[i + 1] = (double)parameter;
}

© Stack Overflow or respective owner

Related posts about c#

Related posts about .NET