c#: Design advice. Using DataTable or List<MyObject> for a generic rule checker

Posted by Andrew White on Stack Overflow See other posts from Stack Overflow or by Andrew White
Published on 2010-06-07T10:29:19Z Indexed on 2010/06/07 10:32 UTC
Read the original article Hit count: 306

Filed under:
|
|
|
|

Hi,

I have about 100,000 lines of generic data. Columns/Properties of this data are user definable and are of the usual data types (string, int, double, date). There will be about 50 columns/properties.

I have 2 needs:

  • To be able to calculate new columns/properties using an expression
    e.g. Column3 = Column1 * Column2.
    Ultimately I would like to be able to use external data using a callback,
    e.g. Column3 = Column1 * GetTemperature
    The expression is relatively simple, maths operations, sum, count & IF are the only necessary functions.
  • To be able to filter/group the data and perform aggregations
    e.g. Sum(Data.Column1) Where(Data.Column2 == "blah")

    As far as I can see I have two options:
    1. Using a DataTable.
    => Point 1 above is achieved by using DataColumn.Expression
    => Point 2 above is acheived by using DataTable.DefaultView.RowFilter & C# code

    2. Using a List of generic Objects each with a Dictionary< string, object > to store the values.
    => Point 1 could be achieved by something like NCalc
    => Point 2 is achieved using LINQ

    DataTable:
    Pros: DataColumn.Expression is inbuilt
    Cons: RowFilter & coding c# is not as "nice" as LINQ, 
          DataColumn.Expression does not support callbacks(?) 
          => workaround could be to get & replace external value when creating 
             the calculated column
    
    GenericList:
    Pros: LINQ syntax, NCalc supports callbacks
    Cons: Implementing NCalc/generic calc engine
    

    Based on the above I would think a GenericList approach would win, but something I have not factored in is the performance which for some reason I think would be better with a datatable.
    Does anyone have a gut feeling / experience with LINQ vs. DataTable performance?
    How about NCalc?
    As I said there are about 100,000 rows of data, with 50 columns, of which maybe 20 are calculated. In total about 50 rules will be run against the data, so in total there will be 5 million row/object scans.

    Would really appreciate any insights. Thx.
    ps. Of course using a database + SQL & Views etc. would be the easiest solution, but for various reasons can't be implemented.

  • © Stack Overflow or respective owner

    Related posts about c#

    Related posts about LINQ