Discrete and Continuous Classifier on Sparse Data
Posted
by Chris S
on Stack Overflow
See other posts from Stack Overflow
or by Chris S
Published on 2010-03-23T14:00:41Z
Indexed on
2010/03/23
14:03 UTC
Read the original article
Hit count: 398
I'm trying to classify an example, which contains discrete and continuous features. Also, the example represents sparse data, so even though the system may have been trained on 100 features, the example may only have 12.
What would be the best classifier algorithm to use to accomplish this? I've been looking at Bayes, Maxent, Decision Tree, and KNN, but I'm not sure any fit the bill exactly. The biggest sticking point I've found is that most implementations don't support sparse data sets and both discrete and continuous features. Can anyone recommend an algorithm and implementation (preferably in Python) that fits these criteria?
Libraries I've looked at so far include:
- Orange (Mostly academic. Implementations not terribly efficient or practical.)
- NLTK (Also academic, although has a good Maxent implementation, but doesn't handle continuous features.)
- Weka (Still researching this. Seems to support a broad range of algorithms, but has poor documentation, so it's unclear what each implementation supports.)
© Stack Overflow or respective owner