Generic dataset handling library

Posted by Pep. on Stack Overflow See other posts from Stack Overflow or by Pep.
Published on 2010-03-21T10:50:01Z Indexed on 2010/03/21 10:51 UTC
Read the original article Hit count: 302

Filed under:
|
|

Hello,

I want to build a generic Perl module for handling and analysing biomedical character separated datasets and which can, most certain, be used on any kind of datasets that contain a mixture of categorical (A,B,C,..) and continuous (1.2,3,881..) and identifier (XXX1,XXX2...). The plan is to have people initialize the module and then use some arguments to point to the data file(s), the place were the analysis reports should be placed and the structure of the data.

By structure of data I mean which variable is in which place and its name/type. And this is where I need some enlightenment. I am baffled how to do this in a clean way. Obviously, having people create a simple schema file, be it XML or some other format would be the cleanest but maybe not all people enjoy doing something like this.

The solutions I can think of are:

  • Create a configuration file in XML or similar and with a prespecified format.
  • Pass the information during initialization of the module.
  • Use the first row of the data as headers and try to guess types (ouch)

Surely there must be a "canonical" way of doing this that is also usable and efficient.

Thanks p.

© Stack Overflow or respective owner

Related posts about perl

Related posts about data-structures