Generic dataset handling library

Posted by Pep. on Stack Overflow See other posts from Stack Overflow or by Pep.
Published on 2010-03-21T10:50:01Z Indexed on 2010/03/21 10:51 UTC
Read the original article Hit count: 364

Filed under:

perl

|

data-structures

|

data

Hello,

I want to build a generic Perl module for handling and analysing biomedical character separated datasets and which can, most certain, be used on any kind of datasets that contain a mixture of categorical (A,B,C,..) and continuous (1.2,3,881..) and identifier (XXX1,XXX2...). The plan is to have people initialize the module and then use some arguments to point to the data file(s), the place were the analysis reports should be placed and the structure of the data.

By structure of data I mean which variable is in which place and its name/type. And this is where I need some enlightenment. I am baffled how to do this in a clean way. Obviously, having people create a simple schema file, be it XML or some other format would be the cleanest but maybe not all people enjoy doing something like this.

The solutions I can think of are:

Create a configuration file in XML or similar and with a prespecified format.
Pass the information during initialization of the module.
Use the first row of the data as headers and try to guess types (ouch)

Surely there must be a "canonical" way of doing this that is also usable and efficient.

Thanks p.

© Stack Overflow or respective owner

Related posts about perl

Munin on Centos 6 - missing perl MODULE_COMPAT_5.8.8

as seen on Server Fault - Search for 'Server Fault'
I'm trying to install Munin on a new VPS through yum install munin but I keep getting an error about a missing perl module: Requires: perl(:MODULE_COMPAT_5.8.8). This is the perl version currently installed: v5.10.1. I've searched all around and still haven't found a solution for this. Here's the… >>> More
Pain removing a perl rootkit

as seen on Server Fault - Search for 'Server Fault'
So, we host a geoservice webserver thing at the office. Someone apparently broke into this box (probably via ftp or ssh), and put some kind of irc-managed rootkit thing. Now I'm trying to clean the whole thing up, I found the process pid who tries to connect via irc, but i can't figure out who's… >>> More
How To Avoid a Perl script calling an Another Perl Script

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, i am calling a perl script client.pl from a main script to capture the output of client.pl in @output. is there anyway to avoid the use of these two files so i can use the output of client.pl in main.pl itself here is my code.... main.pl ======= my @output = readpipe("client.pl"); client… >>> More
Perl :how to sort dates in perl

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, How can I sort the dates in perl. my @dates = ( "02/11/2009" , "12/20/2001" , "11/21/2010" ) ; I have above dates in my array . How can I sort those dates... ? My date format is dd/mm/YYYY. >>> More
please suggest a perl book exclusively for perl programs

as seen on Stack Overflow - Search for 'Stack Overflow'
I want tha name of a perl book for only PERL PROGRAMS. The reason behind is I want to improve my programming skill in perl >>> More

Related posts about data-structures

Clever ways of implementing different data structures in C & data structures that should be used mor

as seen on Stack Overflow - Search for 'Stack Overflow'
What are some clever (not ordinary) ways of implementing data structures in C, and what are some data structures that should be used more often? For example, what is the most effective way (generating minimal overhead) to implement a directed and cyclic graph with weighted edges in C? I know that… >>> More
Is there a way to track data structure dependencies from the database, through the tiers, all the way out to a web page?

as seen on Programmers - Search for 'Programmers'
When we design applications, we generally end up with the same tiered sets of data structures: A persistent data structure that is described using DDL and implemented as RDBMS tables and columns. A set of domain objects that consist primarily of data structures, usually combined with business-rule… >>> More
Why are data structures so important in interviews?

as seen on Programmers - Search for 'Programmers'
I am a newbie into the corporate world recently graduated in computers. I am a java/groovy developer. I am a quick learner and I can learn new frameworks, APIs or even programming languages within considerably short amount of time. Albeit that, I must confess that I was not so strong in data structures… >>> More
Thread-safe data structures

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I have to design a data structure that is to be used in a multi-threaded environment. The basic API is simple: insert element, remove element, retrieve element, check that element exists. The structure's implementation uses implicit locking to guarantee the atomicity of a single API call.… >>> More
Data Structures

as seen on Stack Overflow - Search for 'Stack Overflow'
There is a large stream of numbers coming in such as 5 6 7 2 3 1 2 3 .. What kind of data structure is suitable for this problem given the constraints that elements must be inserted in descending order and duplicates should be eliminated. I am not looking for any code just ideas? I was thinking… >>> More