Does anyone know of an efficient way to do multiple linear regression in C#, where the number of simultaneous equations may be in the 1000's (with 3 or 4 different inputs). After reading this article on multiple linear regression I tried implementing it with a matrix equation:
Matrix y = new Matrix(
new double[,]{{745},
{895},
{442},
{440},
{1598}});
Matrix x = new Matrix(
new double[,]{{1, 36, 66},
{1, 37, 68},
{1, 47, 64},
{1, 32, 53},
{1, 1, 101}});
Matrix b = (x.Transpose() * x).Inverse() * x.Transpose() * y;
for (int i = 0; i < b.Rows; i++)
{
Trace.WriteLine("INFO: " + b[i, 0].ToDouble());
}
However it does not scale well to the scale of 1000's of equations due to the matrix inversion operation. I can call the R language and use that, however I was hoping there would be a pure .Net solution which will scale to these large sets.
Any suggestions?
EDIT #1:
I have settled using R for the time being. By using statconn (downloaded here) I have found it to be both fast & relatively easy to use this method. I.e. here is a small code snippet, it really isn't much code at all to use the R statconn library (note: this is not all the code!).
_StatConn.EvaluateNoReturn(string.Format("output <- lm({0})", equation));
object intercept = _StatConn.Evaluate("coefficients(output)['(Intercept)']");
parameters[0] = (double)intercept;
for (int i = 0; i < xColCount; i++)
{
object parameter = _StatConn.Evaluate(string.Format("coefficients(output)['x{0}']", i));
parameters[i + 1] = (double)parameter;
}