Control the MultipleOutputFormat files sub-path
Posted
by
iCode
on Stack Overflow
See other posts from Stack Overflow
or by iCode
Published on 2012-06-28T03:44:21Z
Indexed on
2012/06/29
3:16 UTC
Read the original article
Hit count: 156
hadoop
I need to control the sub-path of the different different files being managed by MultipleOutputFormat based on the reducer key.
I basically want to set the sub path of the file based on the key given to the reducer.
I can changed the file name by overwrting the generateFileNameForKeyValue method of MultipleOutputFormatbut how can I also change the sub-path of these files?
I mean with just overriding the generateFileNameForKeyValue, I get
mySetJobConfigOutputPath/fileNameBasedKey1.dat
/fileNameBasedKey2.dat
/fileNameBasedKey3.dat
...
but I want to make it to be organize files like below
mySetJobConfigOutputPath/path0ConfiguredInsideReducerBasedOnKey/fileNameBasedKey1.dat
/path1ConfiguredInsideReducerBasedOnKey/fileNameBasedKey2.dat
/fileNameBasedKey3.dat
/path2ConfiguredInsideReducerBasedOnKey/fileNameBasedKey8.dat
as seen, the sub-path and the file name are both figured out by the key inside the reducer.
I know how to configure the file name but was wondering if I can configure the sub-path of the each file under the mySetJobConfigOutputPath folder?
© Stack Overflow or respective owner