Measuring the performance of classification algorithm

Posted by Silver Dragon on Stack Overflow See other posts from Stack Overflow or by Silver Dragon
Published on 2009-01-02T11:09:53Z Indexed on 2010/05/22 22:20 UTC
Read the original article Hit count: 250

I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account.

That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless.

An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training.

White papers, book pointers, and PDFs much appreciated!

© Stack Overflow or respective owner

Related posts about artificial-intelligence

Related posts about machine-learning