Hadoop Rolling Small files
Posted
by
Arenstar
on Server Fault
See other posts from Server Fault
or by Arenstar
Published on 2010-11-16T03:03:56Z
Indexed on
2011/01/05
13:56 UTC
Read the original article
Hit count: 217
I am running Hadoop on a project and need a suggestion.
Generally by default Hadoop has a "block size" of around 64mb..
There is also a suggestion to not use many/small files..
I am currently having very very very small files being put into HDFS due to the application design of flume..
The problem is, that Hadoop <= 0.20 cannot append to files, whereby i have too many files for my map-reduce to function efficiently..
There must be a correct way to simply roll/merge roughly 100 files into one..
Therefore Hadoop is effectively reading 1 large file instead of 10
Any Suggestions??
© Server Fault or respective owner