Network bandwidth bottleneck for sorting of mapreduce intermediate keys?
- by Zubair
I have been learning the mapreduce algorithm and how it can potentially scale to millions of machines, but I don't understand how the sorting of the intermediate keys after the map phase can scale, as there will be:
1,000,000 x 1,000,000
: potential machines communicating small key / value pairs of the intermediate results with each other? Isn't this a bottleneck?