Top n items in a List ( including duplicates )

Posted by Krishnan on Stack Overflow See other posts from Stack Overflow or by Krishnan
Published on 2011-11-25T22:16:59Z Indexed on 2011/11/26 1:50 UTC
Read the original article Hit count: 167

Filed under:
|
|

Trying to find an efficient way to obtain the top N items in a very large list, possibly containing duplicates.

I first tried sorting & slicing, which works. But this seems unnnecessary. You shouldn't need to sort a very large list if you just want the top 20 members. So I wrote a recursive routine which builds the top-n list. This also works, but is very much slower than the non-recursive one!

Question: Which is my second routine (elite2) so much slower than elite, and how do I make it faster ? My code is attached below. Thanks.

import scala.collection.SeqView
import scala.math.min
object X {

    def  elite(s: SeqView[Int, List[Int]], k:Int):List[Int] = {
        s.sorted.reverse.force.slice(0,min(k,s.size))
    }

    def elite2(s: SeqView[Int, List[Int]], k:Int, s2:List[Int]=Nil):List[Int] = {
        if( k == 0 || s.size == 0) s2.reverse
        else {
            val m = s.max
            val parts = s.force.partition(_==m)
            val whole = if( parts._1.size > 1) parts._1.tail:::parts._2 else parts._2
            elite2( whole.view, k-1, m::s2 )
        }
    }

    def main(args:Array[String]) = {
        val N = 1000000/3
        val x = List(N to 1 by -1).flatten.map(x=>List(x,x,x)).flatten.view
        println(elite2(x,20))
        println(elite(x,20))
    }
}

© Stack Overflow or respective owner

Related posts about scala

Related posts about sorting