Using Hadoop, are my reducers guaranteed to get all the records with the same key?

Posted by samg on Stack Overflow See other posts from Stack Overflow or by samg
Published on 2010-04-13T21:16:17Z Indexed on 2010/04/13 23:03 UTC
Read the original article Hit count: 298

Filed under:
|
|

I'm running a hadoop job (using hive actually) which is supposed to uniq lines in a lot of text file. More specifically it chooses the most recently timestamped record for each key in the reduce step.

Does hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if there are many reducers running across a cluster?

I'm worried that the mapper output might be split after the shuffle happens, in the middle of a set of records with the same key.

© Stack Overflow or respective owner

Related posts about hadoop

Related posts about mapreduce