Parallel computation of the median of a large array
- by recipriversexclusion
I got asked this question once and still haven't been able to figure it out:
You have an array of N integers, where N is large, say, a billion. You want to calculate the median value of this array. Assume you have m+1 machines (m workers, one master) to distribute the job to. How would you go about doing this?
Since the median is a nonlinear operator, you can't just find the median in each machine and then take the median of those values.