Extracting noun+noun or (adj|noun)+noun from Text

Posted by ssuhan on Stack Overflow See other posts from Stack Overflow or by ssuhan
Published on 2011-01-05T03:34:15Z Indexed on 2011/01/05 6:54 UTC
Read the original article Hit count: 269

Filed under:

I would like to query if it is possible to extract noun+noun or (adj|noun)+noun in R package openNLP?That is, I would like to use linguistic filtering to extract candidate noun phrases. Could you direct me how to do? Many thanks.

Thanks for the responses. here is the code:

library("openNLP")

acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."

acqTag <- tagPOS(acq)

acqTagSplit = strsplit(acqTag," ")

acqTagSplit

qq = 0

tag = 0

for (i in 1:length(acqTagSplit[[1]])){

qq[i] <-strsplit(acqTagSplit[[1]][i],'/')
tag[i] = qq[i][[1]][2]

}

index = 0

k = 0

for (i in 1:(length(acqTagSplit[[1]])-1)) {

if ((tag[i] == "NN" && tag[i+1] == "NN") | (tag[i] == "NNS" && tag[i+1] == "NNS") | (tag[i] == "NNS" && tag[i+1] == "NN") | (tag[i] == "NN" && tag[i+1] == "NNS") | (tag[i] == "JJ" && tag[i+1] == "NN") | (tag[i] == "JJ" && tag[i+1] == "NNS")){      
    k = k +1
    index[k] = i
}

}

index


Reader can refer index on acqTagSplit to do noun+noun or (adj|noun)+noun extractation. (The code is not optimum but work. If you have any idea, please let me know.)

Furthermore, I still have a problem.

Justeson and Katz (1995) proposed another linguistic filtering to extract candidate noun phrases:

((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun

I cannot well understand its meaning, could someone do me a favor to explain it or transform such representation into R language

© Stack Overflow or respective owner

Related posts about r