Java text classification problem
- by yox
Hello,
I have a set of Books objects, classs Book is defined as following :
Class Book{
String title;
ArrayList<tags> taglist;
}
Where title is the title of the book, example : Javascript for dummies.
and taglist is a list of tags for our example : Javascript, jquery, "web dev", ..
As I said a have a set of books talking about different things : IT, BIOLOGY, HISTORY, ...
Each book has a title and a set of tags describing it..
I have to classify automaticaly those books into separated sets by topic, example :
IT BOOKS :
Java for dummies
Javascript for dummies
Learn flash in 30 days
C++ programming
HISTORY BOOKS :
World wars
America in 1960
Martin luther king's life
BIOLOGY BOOKS :
....
Do you guys know a classification algorithm/method to apply for that kind of problems ?
A solution is to use an external API to define the category of the text, but the problem here is that books are in different languages : french, spanish, english ..