Multiple lines of text to a single map
Posted
by steven
on Stack Overflow
See other posts from Stack Overflow
or by steven
Published on 2010-04-26T04:13:21Z
Indexed on
2010/04/26
4:23 UTC
Read the original article
Hit count: 364
I've been trying to use Hadoop to send N amount of lines to a single mapping. I don't require for the lines to be split already.
I've tried to use NLineInputFormat, however that sends N lines of text from the data to each mapper one line at a time [giving up after the Nth line].
I have tried to set the option and it only takes N lines of input sending it at 1 line at a time to each map:
job.setInt("mapred.line.input.format.linespermap", 10);
I've found a mailing list recommending me to override LineRecordReader::next, however that is not that simple, as that the internal data members are all private.
I've just checked the source for NLineInputFormat and it hard codes LineReader, so overriding will not help.
Also, btw I'm using Hadoop 0.18 for compatibility with the Amazon EC2 MapReduce.
© Stack Overflow or respective owner