KMeans clustering for more than 5 million vectors

Posted by Wajih on Stack Overflow See other posts from Stack Overflow or by Wajih
Published on 2010-08-04T08:54:12Z Indexed on 2011/01/09 12:53 UTC
Read the original article Hit count: 258

Filed under:
|

I have hit a real problem. I need to do some Kmeans clustering for 5 million vectors, each containing about 32 cols. I tried out Mahout which requires linux and I am on windows, I am restrained from using a Linux OS and any sort of simulator.

Can anyone suggest a KMeans clustering algorithm that is scalable upto 5M vectors and can converge quickly?

I have tested a few but they wont scale. Which means they are slow and take forever to complete.

Thanks

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about clustering