How to sift idioms and set phrases apart from other common phrases using NLP techniques?
Posted
by
hippietrail
on Stack Overflow
See other posts from Stack Overflow
or by hippietrail
Published on 2010-12-28T12:46:36Z
Indexed on
2010/12/29
13:54 UTC
Read the original article
Hit count: 315
What techniques exist that can tell the difference betwen plain common phrases such as "to the", "and the" and set phrases and idioms which have their own lexical meanings such as "pick up", "fall in love", "red herring", "dead end"?
Are there techniques which are successful even without a dictionary, statistical methods HMMs train on large corpora for instance?
Or are there heuristics such as ignoring or weighting down "promiscuous" words which can co-occur with just about any word versus words which occur either alone or in a specific limited set of idiomatic phrases?
If there are such heuristics, how do we take into account set phrases and verbal phrases which do incorporate promiscuous words such as "up" in "beat up", "eat up", "sit up", "think up"?
UPDATE
I've found an interesting paper online: Unsupervised Type and Token Identi?cation of Idiomatic Expressions
© Stack Overflow or respective owner