Search Results

Search found 1 results on 1 pages for 'mrjob'.

Page 1/1 | 1 

  • kmeans based on mapreduce by python

    - by user3616059
    I am going to write a mapper and reducer for the kmeans algorithm, I think the best course of action to do is putting the distance calculator in mapper and sending to reducer with the cluster id as key and coordinates of row as value. In reducer, updating the centroids would be performed. I am writing this by python. As you know, I have to use Hadoop streaming to transfer data between STDIN and STOUT. according to my knowledge, when we print (key + "\t"+value), it will be sent to reducer. Reducer will receive data and it calculates the new centroids but when we print new centroids, I think it does not send them to mapper to calculate new clusters and it just send it to STDOUT and as you know, kmeans is a iterative program. So, my questions is whether Hadoop streaming suffers of doing iterative programs and we should employ MRJOB for iterative programs?

    Read the article

1