I am looking for a method to build a hierarchy of words.
Background: I am a "amateur" natural language processing enthusiast and right now one of the problems that I am interested in is determining the hierarchy of word semantics from a group of words.
For example, if I have the set which contains a "super" representation of others, i.e.
[cat, dog, monkey, animal, bird, ... ]
I am interested to use any technique which would allow me to extract the word 'animal' which has the most meaningful and accurate representation of the other words inside this set.
Note: they are NOT the same in meaning. cat != dog != monkey != animal
BUT cat is a subset of animal and dog is a subset of animal.
I know by now a lot of you will be telling me to use wordnet. Well, I will try to but I am actually interested in doing a very domain specific area which WordNet doesn't apply because:
1) Most words are not found in Wordnet
2) All the words are in another language; translation is possible but is to limited effect.
another example would be:
[ noise reduction, focal length, flash, functionality, .. ]
so functionality includes everything in this set.
I have also tried crawling wikipedia pages and applying some techniques on td-idf etc but wikipedia pages doesn't really do much either.
Can someone possibly enlighten me as to what direction my research should go towards? (I could use anything)