problem with hierarchical clustering in Python
        Posted  
        
            by user248237
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by user248237
        
        
        
        Published on 2010-05-30T22:52:41Z
        Indexed on 
            2010/05/30
            23:02 UTC
        
        
        Read the original article
        Hit count: 316
        
I am doing a hierarchical clustering a 2 dimensional matrix by correlation distance metric (i.e. 1 - Pearson correlation). My code is the following (the data is in a variable called "data"):
from hcluster import *
Y = pdist(data, 'correlation')
cluster_type = 'average'
Z = linkage(Y, cluster_type)
dendrogram(Z)
The error I get is:
ValueError: Linkage 'Z' contains negative distances. 
What causes this error? The matrix "data" that I use is simply:
[[  156.651968  2345.168618]
 [  158.089968  2032.840106]
 [  207.996413  2786.779081]
 [  151.885804  2286.70533 ]
 [  154.33665   1967.74431 ]
 [  150.060182  1931.991169]
 [  133.800787  1978.539644]
 [  112.743217  1478.903191]
 [  125.388905  1422.3247  ]]
I don't see how pdist could ever produce negative numbers when taking 1 - pearson correlation. Any ideas on this?
thank you.
© Stack Overflow or respective owner