How do I control output files name and content of an Hadoop streaming job?
Posted
by Eran Kampf
on Stack Overflow
See other posts from Stack Overflow
or by Eran Kampf
Published on 2009-05-20T13:18:43Z
Indexed on
2010/03/17
10:11 UTC
Read the original article
Hit count: 198
Is there a way to control the output filenames of an Hadoop Streaming job? Specifically I would like my job's output files content and name to be organized by the ket the reducer outputs - each file would only contain values for one key and its name would be the key.
Update: Just found the answer - Using a Java class that derives from MultipleOutputFormat as the jobs output format allows control of the output file names. http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.htmlhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
I havent seen any samples for this out there... Can anyone point out to an Hadoop Streaming sample that makes use of a custom output format Java class?
© Stack Overflow or respective owner