Very basic question about Hadoop and compressed input files
Posted
by Luis Sisamon
on Stack Overflow
See other posts from Stack Overflow
or by Luis Sisamon
Published on 2010-01-16T20:42:17Z
Indexed on
2010/04/11
14:03 UTC
Read the original article
Hit count: 313
hadoop
|compression
I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file is compressed then the file could not be split and wold need to be processed by a single node (effectively destroying the advantage of running a mapreduce ver a cluster of parallel machines).
My question is, assuming the above is correct, is it possible to split a large file manually in fixed-size chunks, or daily chunks, compress them and then pass a list of compressed input files to perform a mapreduce?
© Stack Overflow or respective owner