How to sift idioms and set phrases apart from other common phrases using NLP techniques?

Posted by hippietrail on Stack Overflow See other posts from Stack Overflow or by hippietrail
Published on 2010-12-28T12:46:36Z Indexed on 2010/12/29 13:54 UTC
Read the original article Hit count: 350

Filed under:

nlp

|

phrase

|

hmm

What techniques exist that can tell the difference betwen plain common phrases such as "to the", "and the" and set phrases and idioms which have their own lexical meanings such as "pick up", "fall in love", "red herring", "dead end"?

Are there techniques which are successful even without a dictionary, statistical methods HMMs train on large corpora for instance?

Or are there heuristics such as ignoring or weighting down "promiscuous" words which can co-occur with just about any word versus words which occur either alone or in a specific limited set of idiomatic phrases?

If there are such heuristics, how do we take into account set phrases and verbal phrases which do incorporate promiscuous words such as "up" in "beat up", "eat up", "sit up", "think up"?

UPDATE

I've found an interesting paper online: Unsupervised Type and Token Identi?cation of Idiomatic Expressions

© Stack Overflow or respective owner

Related posts about nlp

stanford pos tagger runs out of memory?

as seen on Stack Overflow - Search for 'Stack Overflow'
my stanford tagger ran out of memory. Is it because the text has to be properly formatted? This is because i use it to tag html contents, with the tags stripped, but there may have quite a excessive amount of newlines. here is the error: BlockquoWARNING: Untokenizable: ? (char in decimal: 9829) … >>> More
NLP with greatly contrained input and abilities

as seen on Stack Overflow - Search for 'Stack Overflow'
Hat in hand here. I'm a seasoned developer and I would be grateful for a bit of help. I don't have time to read or digest long intricate discussions on theoretical concepts around NLP (or go get my PHD). That said, I have read a few and it's a damn interesting field. The problem is I need real world… >>> More
NLP - Word Alignment

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi..:) I am looking for word alignment tools and algorithms, I am dealing with bilingual English - Hindi text, Currently I am working on DTW(Dynamic Time Warping) algorithm, CLA(Competitive Linking Algorithm) , NATool, Giza++. Could you please suggest me any other alogrithm/tool which is language… >>> More
AGFL npx grammar nlp techniques dependency parsing

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi I am trying to obtain a dependency parse tree using AGFL. Unfortunately I cannot understand how to derive this. I am trying to generate the npx grammar but I am still lost can someone help me please? Thanks :) L >>> More
Starting out NLP - Python + large data set

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I've been wanting to learn python and do some NLP, so have finally gotten round to starting. Downloaded the english wikipedia mirror for a nice chunky dataset to start on, and have been playing around a bit, at this stage just getting some of it into a sqlite db (havent worked with dbs in the… >>> More

Related posts about phrase

Making dtSearch highlight one hit per phrase, rather than one hit per word-in-a-phrase

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm using dtSearch to highlight text search matches within a document. The code to do this, minus some details and cleanup, is roughly along these lines: SearchJob sj = new SearchJob(); sj.Request = "\"audit trail\""; // the user query sj.FoldersToSearch.Add(path_to_src_document); sj.Execute(); FileConverter… >>> More
Delphi Phrase Count / Keyword Density

as seen on Stack Overflow - Search for 'Stack Overflow'
Does anyone know how to or have some code on counting the number of unique phrases in a document? (Single word, two word phrases, three word phrases). Thanks Example of what I'm looking for: What I mean is I have a text document, and i need to see what the most popular word phrases are. Example… >>> More
Delphi Phrase Count

as seen on Stack Overflow - Search for 'Stack Overflow'
Does anyone know how to or have some code on counting the number of unique phrases in a document? (Single word, two word phrases, three word phrases). Thanks >>> More
4 Best Ways to Get LSI Phrase

as seen on Ezine Articles - Search for 'Ezine Articles'
Excessive use of keywords in the article may cause the failure of the optimization in your articles. With the LSI phrase, you will get a better ranking in search engines. >>> More
Regex to represent "NOT" in a group

as seen on Stack Overflow - Search for 'Stack Overflow'
I have this Regex; <(\d+)(\w+\s\d+\s\d+(?::\d+){2})\s([\w\/.-])(.) What I want to do is to return FALSE(Not matched) if the third group is "MSWinEventLog" and returning "matched" for the rest. <166Apr 28 10:46:34 AMC the remaining phrase <11Apr 28 10:46:34 MSWinEventLog the remaining phrase <170Apr… >>> More