How is intermediate data organized in MapReduce?
Posted
by
Pedro Cattori
on Programmers
See other posts from Programmers
or by Pedro Cattori
Published on 2014-02-06T04:12:16Z
Indexed on
2014/06/07
3:45 UTC
Read the original article
Hit count: 303
design
|functional-programming
From what I understand, each mapper outputs an intermediate file. The intermediate data (data contained in each intermediate file) is then sorted by key.
Then, a reducer is assigned a key by the master. The reducer reads from the intermediate file containing the key and then calls reduce using the data it has read.
But in detail, how is the intermediate data organized? Can a data corresponding to a key be held in multiple intermediate files? What happens when there is too much data corresponding to one key to be held by a single file?
In short, how do intermediate partitions differ from intermediate files and how are these differences dealt with in the implementation?
© Programmers or respective owner