Very basic question about Hadoop and compressed input files

Posted by Luis Sisamon on Stack Overflow See other posts from Stack Overflow or by Luis Sisamon
Published on 2010-01-16T20:42:17Z Indexed on 2010/04/11 14:03 UTC
Read the original article Hit count: 313

Filed under:
|

I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file is compressed then the file could not be split and wold need to be processed by a single node (effectively destroying the advantage of running a mapreduce ver a cluster of parallel machines).

My question is, assuming the above is correct, is it possible to split a large file manually in fixed-size chunks, or daily chunks, compress them and then pass a list of compressed input files to perform a mapreduce?

© Stack Overflow or respective owner

Related posts about hadoop

Related posts about compression