KMeans clustering for more than 5 million vectors
- by Wajih
I have hit a real problem. I need to do some Kmeans clustering for 5 million vectors, each containing about 32 cols.
I tried out Mahout which requires linux and I am on windows, I am restrained from using a Linux OS and any sort of simulator.
Can anyone suggest a KMeans clustering algorithm that is scalable upto 5M vectors and can converge quickly?
I have tested a few but they wont scale. Which means they are slow and take forever to complete.
Thanks