How to sift idioms and set phrases apart from other common phrases using NLP techniques?

Posted by hippietrail on Stack Overflow See other posts from Stack Overflow or by hippietrail
Published on 2010-12-28T12:46:36Z Indexed on 2010/12/29 13:54 UTC
Read the original article Hit count: 315

Filed under:
|
|

What techniques exist that can tell the difference betwen plain common phrases such as "to the", "and the" and set phrases and idioms which have their own lexical meanings such as "pick up", "fall in love", "red herring", "dead end"?

Are there techniques which are successful even without a dictionary, statistical methods HMMs train on large corpora for instance?

Or are there heuristics such as ignoring or weighting down "promiscuous" words which can co-occur with just about any word versus words which occur either alone or in a specific limited set of idiomatic phrases?

If there are such heuristics, how do we take into account set phrases and verbal phrases which do incorporate promiscuous words such as "up" in "beat up", "eat up", "sit up", "think up"?

UPDATE

I've found an interesting paper online: Unsupervised Type and Token Identi?cation of Idiomatic Expressions

© Stack Overflow or respective owner

Related posts about nlp

Related posts about phrase