Datamining on a mysql database
- by sliptix
Hello,
I Begin with textmining.
I have two database tables with thousands of data..
a table for "skills" and a table for "skills categories"
every "skill" belongs to a skills categorie.
a "skill" is , physicaly, a varchar(200) field in the database, where there is some text describing the skill.
Here are some skills extracted from the skills table:
"PHP (good level), Java (intermediaite), C++"
"PHP5"
"project management and quality management"
"begining Javascript"
"water engineering"
"dfsdf zerze rzer"
"cibling customers"
what i want to do is to extract knowledge from those fields, i mean extract only the real skill and ignore the rest of useless text.
for the above example i want to get only an array with:
"PHP"
"Java"
"C++"
"PHP5"
"project management"
"quality management"
"Javascript"
"water engineering"
"cibling customers"
what should i do to extract the skills from tons of data please ?
do you know specific algorithms to do this ? ex : k-means ... ?
Thanks in advance.