Mapping words to numbers with respect to definition
Posted
by thornate
on Stack Overflow
See other posts from Stack Overflow
or by thornate
Published on 2010-03-22T01:48:01Z
Indexed on
2010/03/22
4:01 UTC
Read the original article
Hit count: 284
As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc.
Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be more useful if words with similar meanings had numerical values close to each other, so that 'good' becomes '6827' and 'great' becomes '6835', etc.
As another option, instead of a simple integer representing each number, it would be even better to have a vector made up of multiple numbers, eg (lexical_category, tense, classification, specific_word) where lexical_category is noun/verb/adjective/etc, tense is future/past/present, classification defines a wide set of general topics and specific_word is much the same as described in the previous paragraph.
Does any such an algorithm exist? If not, can you give me any tips on how to get started on developing one myself? I code in C++.
© Stack Overflow or respective owner