Java text classification problem
Posted
by yox
on Stack Overflow
See other posts from Stack Overflow
or by yox
Published on 2010-05-12T18:16:29Z
Indexed on
2010/05/12
19:14 UTC
Read the original article
Hit count: 302
Hello,
I have a set of Books objects, classs Book is defined as following :
Class Book{
String title;
ArrayList<tags> taglist;
}
Where title is the title of the book, example : Javascript for dummies.
and taglist is a list of tags for our example : Javascript, jquery, "web dev", ..
As I said a have a set of books talking about different things : IT, BIOLOGY, HISTORY, ... Each book has a title and a set of tags describing it..
I have to classify automaticaly those books into separated sets by topic, example :
IT BOOKS :
- Java for dummies
- Javascript for dummies
- Learn flash in 30 days
- C++ programming
HISTORY BOOKS :
- World wars
- America in 1960
- Martin luther king's life
BIOLOGY BOOKS :
- ....
Do you guys know a classification algorithm/method to apply for that kind of problems ?
A solution is to use an external API to define the category of the text, but the problem here is that books are in different languages : french, spanish, english ..
© Stack Overflow or respective owner