Network bandwidth bottleneck for sorting of mapreduce intermediate keys?
Posted
by Zubair
on Stack Overflow
See other posts from Stack Overflow
or by Zubair
Published on 2010-03-11T08:42:42Z
Indexed on
2010/03/13
5:55 UTC
Read the original article
Hit count: 150
mapreduce
I have been learning the mapreduce algorithm and how it can potentially scale to millions of machines, but I don't understand how the sorting of the intermediate keys after the map phase can scale, as there will be:
1,000,000 x 1,000,000
: potential machines communicating small key / value pairs of the intermediate results with each other? Isn't this a bottleneck?
© Stack Overflow or respective owner