Hadoop Rolling Small files

Posted by Arenstar on Server Fault See other posts from Server Fault or by Arenstar
Published on 2010-11-16T03:03:56Z Indexed on 2011/01/05 13:56 UTC
Read the original article Hit count: 217

Filed under:
|
|
|
|

I am running Hadoop on a project and need a suggestion.

Generally by default Hadoop has a "block size" of around 64mb..
There is also a suggestion to not use many/small files..

I am currently having very very very small files being put into HDFS due to the application design of flume..

The problem is, that Hadoop <= 0.20 cannot append to files, whereby i have too many files for my map-reduce to function efficiently..

There must be a correct way to simply roll/merge roughly 100 files into one..
Therefore Hadoop is effectively reading 1 large file instead of 10

Any Suggestions??

© Server Fault or respective owner

Related posts about linux

Related posts about hadoop