strange chi-square result using scikit_learn with feature matrix

Posted by user963386 on Stack Overflow See other posts from Stack Overflow or by user963386
Published on 2012-10-20T09:41:04Z Indexed on 2012/10/20 11:01 UTC
Read the original article Hit count: 294

I am using scikit learn to calculate the basic chi-square statistics(sklearn.feature_selection.chi2(X, y)):

def chi_square(feat,target):
"""   """
from sklearn.feature_selection import chi2
ch,pval =  chi2(feat,target)
return ch,pval



chisq,p = chi_square(feat_mat,target_sc)
print(chisq)
print("**********************")
print(p)

I have 1500 samples,45 features,4 classes. The input is a feature matrix with 1500x45 and a target array with 1500 components. The feature matrix is not sparse. When I run the program and I print the arrray "chisq" with 45 components, I can see that the component 13 has a negative value and p = 1. How is it possible? Or what does it mean or what is the big mistake that I am doing?

I am attaching the printouts of chisq and p:

[  9.17099260e-01   3.77439701e+00   5.35004211e+01   2.17843312e+03
   4.27047184e+04   2.23204883e+01   6.49985540e-01   2.02132664e-01
   1.57324454e-03   2.16322638e-01   1.85592258e+00   5.70455805e+00
   1.34911126e-02  -1.71834753e+01   1.05112366e+00   3.07383691e-01
   5.55694752e-02   7.52801686e-01   9.74807972e-01   9.30619466e-02
   4.52669897e-02   1.08348058e-01   9.88146259e-03   2.26292358e-01
   5.08579194e-02   4.46232554e-02   1.22740419e-02   6.84545170e-02
   6.71339545e-03   1.33252061e-02   1.69296016e-02   3.81318236e-02
   4.74945604e-02   1.59313146e-01   9.73037448e-03   9.95771327e-03
   6.93777954e-02   3.87738690e-02   1.53693158e-01   9.24603716e-04
   1.22473138e-01   2.73347277e-01   1.69060817e-02   1.10868365e-02
   8.62029628e+00]

**********************

[  8.21299526e-01   2.86878266e-01   1.43400668e-11   0.00000000e+00
   0.00000000e+00   5.59436980e-05   8.84899894e-01   9.77244281e-01
   9.99983411e-01   9.74912223e-01   6.02841813e-01   1.26903019e-01
   9.99584918e-01   1.00000000e+00   7.88884155e-01   9.58633878e-01
   9.96573548e-01   8.60719653e-01   8.07347364e-01   9.92656816e-01
   9.97473024e-01   9.90817144e-01   9.99739526e-01   9.73237195e-01
   9.96995722e-01   9.97526259e-01   9.99639669e-01   9.95333185e-01
   9.99853998e-01   9.99592531e-01   9.99417113e-01   9.98042114e-01
   9.97286030e-01   9.83873717e-01   9.99745466e-01   9.99736512e-01
   9.95239765e-01   9.97992843e-01   9.84693908e-01   9.99992525e-01
   9.89010468e-01   9.64960636e-01   9.99418323e-01   9.99690553e-01
   3.47893682e-02]

© Stack Overflow or respective owner

Related posts about python

Related posts about classification