how to implement a really efficient bitvector sorting in python

Posted by xiao on Stack Overflow See other posts from Stack Overflow or by xiao
Published on 2010-06-07T17:22:45Z Indexed on 2010/06/07 17:42 UTC
Read the original article Hit count: 288

Filed under:
|
|
|

Hello guys!

Actually this is an interesting topic from programming pearls, sorting 10 digits telephone numbers in a limited memory with an efficient algorithm. You can find the whole story here

What I am interested in is just how fast the implementation could be in python. I have done a naive implementation with the module bitvector. The code is as following:

from BitVector import BitVector
import timeit
import random
import time
import sys

def sort(input_li):
        return sorted(input_li)

def vec_sort(input_li):
        bv = BitVector( size = len(input_li) )
        for i in input_li:
                bv[i] = 1

        res_li = []
        for i in range(len(bv)):
                if bv[i]:
                        res_li.append(i)

        return res_li

if __name__ == "__main__":
        test_data = range(int(sys.argv[1]))
        print 'test_data size is:', sys.argv[1]
        random.shuffle(test_data)

        start = time.time()
        sort(test_data)
        elapsed = (time.time() - start)
        print "sort function takes " + str(elapsed)
        start = time.time()
        vec_sort(test_data)
        elapsed = (time.time() - start)
        print "sort function takes " + str(elapsed)
        start = time.time()
        vec_sort(test_data)
        elapsed = (time.time() - start)
        print "vec_sort function takes " + str(elapsed)

I have tested from array size 100 to 10,000,000 in my macbook(2GHz Intel Core 2 Duo 2GB SDRAM), the result is as following:


  • test_data size is: 1000
  • sort function takes 0.000274896621704
  • vec_sort function takes 0.00383687019348

  • test_data size is: 10000

  • sort function takes 0.00380706787109
  • vec_sort function takes 0.0371489524841

  • test_data size is: 100000

  • sort function takes 0.0520560741425
  • vec_sort function takes 0.374383926392

  • test_data size is: 1000000

  • sort function takes 0.867373943329
  • vec_sort function takes 3.80475401878

  • test_data size is: 10000000

  • sort function takes 12.9204008579
  • vec_sort function takes 38.8053860664

What disappoints me is that even when the test_data size is 100,000,000, the sort function is still faster than vec_sort. Is there any way to accelerate the vec_sort function?

© Stack Overflow or respective owner

Related posts about python

Related posts about algorithm