top-k selection/merge
- by tcurdt
I have n sorted lists. These lists are quite long (300000+ tuples). Selecting the top 10 of the individual lists is of course trivial - they are right at the head of the lists. Where it gets more interesting is when I want the top 10 of all the sorted lists.
The question is whether there is an algorithm to calculate the combined top 10 having the correct order while cutting off the long tail of the lists. The goal is to reduce the required space. And if there is: How does one find the limit where is is safe to cut?
Note: The actual counts are not important. Only the order is.