Java text classification problem

Posted by yox on Stack Overflow See other posts from Stack Overflow or by yox
Published on 2010-05-12T18:16:29Z Indexed on 2010/05/12 19:14 UTC
Read the original article Hit count: 298

Hello,

I have a set of Books objects, classs Book is defined as following :

Class Book{

String title;
ArrayList<tags> taglist;

}

Where title is the title of the book, example : Javascript for dummies.

and taglist is a list of tags for our example : Javascript, jquery, "web dev", ..

As I said a have a set of books talking about different things : IT, BIOLOGY, HISTORY, ... Each book has a title and a set of tags describing it..

I have to classify automaticaly those books into separated sets by topic, example :

IT BOOKS :

  • Java for dummies
  • Javascript for dummies
  • Learn flash in 30 days
  • C++ programming

HISTORY BOOKS :

  • World wars
  • America in 1960
  • Martin luther king's life

BIOLOGY BOOKS :

  • ....

Do you guys know a classification algorithm/method to apply for that kind of problems ?

A solution is to use an external API to define the category of the text, but the problem here is that books are in different languages : french, spanish, english ..

© Stack Overflow or respective owner

Related posts about machine-learning

Related posts about classification