KMeans clustering for more than 5 million vectors
Posted
by
Wajih
on Stack Overflow
See other posts from Stack Overflow
or by Wajih
Published on 2010-08-04T08:54:12Z
Indexed on
2011/01/09
12:53 UTC
Read the original article
Hit count: 258
algorithm
|clustering
I have hit a real problem. I need to do some Kmeans clustering for 5 million vectors, each containing about 32 cols. I tried out Mahout which requires linux and I am on windows, I am restrained from using a Linux OS and any sort of simulator.
Can anyone suggest a KMeans clustering algorithm that is scalable upto 5M vectors and can converge quickly?
I have tested a few but they wont scale. Which means they are slow and take forever to complete.
Thanks
© Stack Overflow or respective owner