Machine leaning algorithm for data classification.

Posted by twk on Stack Overflow See other posts from Stack Overflow or by twk
Published on 2010-06-03T15:49:33Z Indexed on 2010/06/05 17:22 UTC
Read the original article Hit count: 371

Filed under:

machine-learning

|

classification

Hi all,

I'm looking for some guidance about which techniques/algorithms I should research to solve the following problem. I've currently got an algorithm that clusters similar-sounding mp3s using acoustic fingerprinting. In each cluster, I have all the different metadata (song/artist/album) for each file. For that cluster, I'd like to pick the "best" song/artist/album metadata that matches an existing row in my database, or if there is no best match, decide to insert a new row.

For a cluster, there is generally some correct metadata, but individual files have many types of problems:

Artist/songs are completely misnamed, or just slightly mispelled
the artist/song/album is missing, but the rest of the information is there
the song is actually a live recording, but only some of the files in the cluster are labeled as such.
there may be very little metadata, in some cases just the file name, which might be artist - song.mp3, or artist - album - song.mp3, or another variation

A simple voting algorithm works fairly well, but I'd like to have something I can train on a large set of data that might pick up more nuances than what I've got right now. Any links to papers or similar projects would be greatly appreciated.

Thanks!

© Stack Overflow or respective owner

Related posts about machine-learning

Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi All! I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in… >>> More
Is it possible to predict future using machine learning and/or AI?

as seen on Programmers - Search for 'Programmers'
Recently I have started reading about machine learning. From 3000 feet view, machine learning seems really great thing but as if now I have found that machine learning is limited to only 3 types of algorithms namely classification, clustering and recommendations. I would like to know if my assumption… >>> More
Design for a machine learning artificial intelligence framework

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a community wiki which aims to provide a good design for a machine learning/artificial intelligence framework (ML/AI framework). Please contribute to the design of a language-agnostic framework which would allow multiple ML/AI algorithms to be plugged into a single framework which: runs… >>> More
A good machine learning technique to weed out good URLs from bad

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have an application that needs to discriminate between good HTTP GET requests and bad. For example: http://somesite.com?passes=dodgy+parameter # BAD http://anothersite.com?passes=a+good+parameter # GOOD My system can make a binary decision about whether or not a… >>> More
Design for a machine learning artificial intelligence framework (community wiki)

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a community wiki which aims to provide a good design for a machine learning/artificial intelligence framework (ML/AI framework). Please contribute to the design of a language-agnostic framework which would allow multiple ML/AI algorithms to be plugged into a single framework which: runs… >>> More

Related posts about classification

SVM Classification - minimum number of input sets for each class

as seen on Stack Overflow - Search for 'Stack Overflow'
Im trying to build an app to detect images which are advertisements from the webpages. Once I detect those Ill not be allowing those to be displayed on the client side. From the help that I got here in stackoverflow, I thought SVM is the best approach to my aim. So, I have coded SVM and an SMO myself… >>> More
Quick guide to Oracle IRM 11g: Classification design

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Quick guide to Oracle IRM 11g indexThis is the final article in the quick guide to Oracle IRM. If you've followed everything prior you will now have a fully functional and tested Information Rights Management service. It doesn't matter if you've been following the 10g or 11g guide as this next article… >>> More
Bayes Misclassification error and plot : pattern recognition [closed]

as seen on Programmers - Search for 'Programmers'
Below is a Matlab code for Bayes classifier which classifies arbitrary numbers into their classes. training = [3;5;17;19;24;27;31;38;45;48;52;56;66;69;73;78;84;88]; target_class = [0;0;10;10;20;20;30;30;40;40;50;50;60;60;70;70;80;80]; test = [1:2:90]'; class = classify(test… >>> More
Issue in understanding how to compare performance of classifier using ROC

as seen on Programmers - Search for 'Programmers'
I am trying to demystify pattern recognition techniques and understood few of them. I am trying to design a classifier M. A gesture is classified based on the hamming distance between the sample time series y and the training time series x. The result of the classifier are probabilistic values. There… >>> More
Classification of relationships in words?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I'm not sure whats the best algorithm to use for the classification of relationships in words. For example in the case of a sentence such as "The yellow sun" there is a relationship between yellow and sun. THe machine learning techniques I have considered so far are Baynesian Statistics, Rough… >>> More