Parsing multiple files at a time in Perl
Posted
by
sfactor
on Stack Overflow
See other posts from Stack Overflow
or by sfactor
Published on 2010-12-31T14:23:06Z
Indexed on
2010/12/31
14:54 UTC
Read the original article
Hit count: 269
I have a large data set (around 90GB) to work with. There are data files (tab delimited) for each hour of each day and I need to perform operations in the entire data set. For example, get the share of OSes which are given in one of the columns. I tried merging all the files into one huge file and performing the simple count operation but it was simply too huge for the server memory.
So, I guess I need to perform the operation each file at a time and then add up in the end. I am new to perl and am especially naive about the performance issues. How do I do such operations in a case like this.
As an example two columns of the file are.
ID OS
1 Windows
2 Linux
3 Windows
4 Windows
Lets do something simple, counting the share of the OSes in the data set. So, each .txt file has millions of these lines and there are many such files. What would be the most efficient way to operate on the entire files.
© Stack Overflow or respective owner