Hi,
I have about 100,000 lines of generic data. Columns/Properties of this data are user definable and are of the usual data types (string, int, double, date). There will be about 50 columns/properties.
I have 2 needs:
To be able to calculate new columns/properties using an expression
e.g. Column3 = Column1 * Column2.
Ultimately I would like to be able to use external data using a callback, e.g. Column3 = Column1 * GetTemperature
The expression is relatively simple, maths operations, sum, count & IF are the only necessary functions.
To be able to filter/group the data and perform aggregations
e.g. Sum(Data.Column1) Where(Data.Column2 == "blah")
As far as I can see I have two options:
1. Using a DataTable.
= Point 1 above is achieved by using DataColumn.Expression
= Point 2 above is acheived by using DataTable.DefaultView.RowFilter & C# code
2. Using a List of generic Objects each with a Dictionary< string, object to store the values.
= Point 1 could be achieved by something like NCalc
= Point 2 is achieved using LINQ
DataTable:
Pros: DataColumn.Expression is inbuilt
Cons: RowFilter & coding c# is not as "nice" as LINQ,
DataColumn.Expression does not support callbacks(?)
= workaround could be to get & replace external value when creating
the calculated column
GenericList:
Pros: LINQ syntax, NCalc supports callbacks
Cons: Implementing NCalc/generic calc engine
Based on the above I would think a GenericList approach would win, but something I have not factored in is the performance which for some reason I think would be better with a datatable.
Does anyone have a gut feeling / experience with LINQ vs. DataTable performance?
How about NCalc?
As I said there are about 100,000 rows of data, with 50 columns, of which maybe 20 are calculated.
In total about 50 rules will be run against the data, so in total there will be 5 million row/object scans.
Would really appreciate any insights. Thx.
ps. Of course using a database + SQL & Views etc. would be the easiest solution, but for various reasons can't be implemented.