Discrete and Continuous Classifier on Sparse Data

Posted by Chris S on Stack Overflow See other posts from Stack Overflow or by Chris S
Published on 2010-03-23T14:00:41Z Indexed on 2010/03/23 14:03 UTC
Read the original article Hit count: 411

Filed under:

discrete

|

continuous

|

classifier

|

python

|

sparse-data

I'm trying to classify an example, which contains discrete and continuous features. Also, the example represents sparse data, so even though the system may have been trained on 100 features, the example may only have 12.

What would be the best classifier algorithm to use to accomplish this? I've been looking at Bayes, Maxent, Decision Tree, and KNN, but I'm not sure any fit the bill exactly. The biggest sticking point I've found is that most implementations don't support sparse data sets and both discrete and continuous features. Can anyone recommend an algorithm and implementation (preferably in Python) that fits these criteria?

Libraries I've looked at so far include:

Orange (Mostly academic. Implementations not terribly efficient or practical.)
NLTK (Also academic, although has a good Maxent implementation, but doesn't handle continuous features.)
Weka (Still researching this. Seems to support a broad range of algorithms, but has poor documentation, so it's unclear what each implementation supports.)

© Stack Overflow or respective owner

Related posts about discrete

How to check if a Webcam is broken?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I just bought an Acer Aspire 3830TG, it comes with an integrated 1.3M HD Webcam. Before buying it i tried with a bootable Lubuntu usb stick, everything worked well except for the webcam, which i thought I had to tweak. The thing is that it seems the camera should work with no problems in ubuntu.… >>> More
using isight camera in macbookpro(8,2) on ubuntu 12.04 virtualbox VM

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I'm having a lot of trouble using the built-in isight camera on my macbookpro8,2 (early 2011) from an ubuntu 12.04 virtual machine, run inside VirtualBox. The following is the log I get when I try to run guvcview ubuntu@ubuntu:~$ guvcview guvcview 1.5.3 ALSA lib pcm.c:2217:(snd_pcm_open_noupdate)… >>> More
Pohlig–Hellman algorithm for computing discrete logarithms

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi Folks, I'm working on coding the Pohlig-Hellman Algorithm but I am having problem understand the steps in the algorithm based on the definition of the algorithm. Going by the Wiki of the algorithm: http://en.wikipedia.org/wiki/Pohlig%E2%80%93Hellman_algorithm I know the first part 1) is to calculate… >>> More
Discrete and Continuous Classifier on Sparse Data

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying to classify an example, which contains discrete and continuous features. Also, the example represents sparse data, so even though the system may have been trained on 100 features, the example may only have 12. What would be the best classifier algorithm to use to accomplish this? I've… >>> More
GUVCVIEW errors

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I had GUVCVIEW working once before. it suddenly quit working. This is the error I receive........ bt_audio_service_open: connect() failed: Connection refused (111) bt_audio_service_open: connect() failed: Connection refused (111) bt_audio_service_open: connect() failed: Connection refused (111) bt_audio_service_open:… >>> More

Related posts about continuous

List of resources for database continuous integration

as seen on Simple Talk - Search for 'Simple Talk'
Because there is so little information on database continuous integration out in the wild, I've taken it upon myself to aggregate as much as possible and post the links to this blog. Because it's my area of expertise, this will focus on SQL Server and Red Gate tooling, although I am keen to include… >>> More
CruiseControl [.Net] vs TeamCity for continuous integration?

as seen on Stack Overflow - Search for 'Stack Overflow'
i would like to ask you which automated build environment you consider better, based on practical experience. i'm planning to do some .Net and some Java development, so i would like to have a tool that supports both these platforms. i've been reading around and found out about CruiseControl.NET,… >>> More
Tech Article: Oracle ADF Essentials - Continuous Integration with Hudson

as seen on Oracle Technology Network - Search for 'Oracle Technology Network'
The basics of installing and configuring the Hudson integration engine to continuously build and test Oracle ADF applications >>> More
Continuous Integration with TeamCity

as seen on Dot net Slackers - Search for 'Dot net Slackers'
TeamCity is a CI server that has been gaining popularity in the .NET community for the last few years. It is packed with handy futures which we will discuss in a minute and has a free version suitable for smaller teams. >>> More
Continuous Integration with Hudson

as seen on Oracle Technology Network - Search for 'Oracle Technology Network'
New technical article >>> More