Using Hadoop, are my reducers guaranteed to get all the records with the same key?
Posted
by samg
on Stack Overflow
See other posts from Stack Overflow
or by samg
Published on 2010-04-13T21:16:17Z
Indexed on
2010/04/13
23:03 UTC
Read the original article
Hit count: 295
I'm running a hadoop job (using hive actually) which is supposed to uniq lines in a lot of text file. More specifically it chooses the most recently timestamped record for each key in the reduce step.
Does hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if there are many reducers running across a cluster?
I'm worried that the mapper output might be split after the shuffle happens, in the middle of a set of records with the same key.
© Stack Overflow or respective owner