How do I concatenate a lot of files into one inside Hadoop, with no mapping or reduction

Posted by Leonard on Stack Overflow See other posts from Stack Overflow or by Leonard
Published on 2010-04-08T22:12:09Z Indexed on 2010/04/08 22:53 UTC
Read the original article Hit count: 377

Filed under:

I'm trying to combine multiple files in multiple input directories into a single file, for various odd reasons I won't go into. My initial try was to write a 'nul' mapper and reducer that just copied input to output, but that failed. My latest try is:

vcm_hadoop lester jar /vcm/home/apps/hadoop/contrib/streaming/hadoop-*-streaming.jar -input /cruncher/201004/08/17/00 -output /lcuffcat9 -mapper /bin/cat -reducer NONE

but I end up with multiple output files anyway. Anybody know how I can coax everything into a single output file?

© Stack Overflow or respective owner

Related posts about hadoop