Extracting noun+noun or (adj|noun)+noun from Text
Posted
by
ssuhan
on Stack Overflow
See other posts from Stack Overflow
or by ssuhan
Published on 2011-01-05T03:34:15Z
Indexed on
2011/01/05
6:54 UTC
Read the original article
Hit count: 264
r
I would like to query if it is possible to extract noun+noun or (adj|noun)+noun in R package openNLP?That is, I would like to use linguistic filtering to extract candidate noun phrases. Could you direct me how to do? Many thanks.
Thanks for the responses. here is the code:
library("openNLP")
acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."
acqTag <- tagPOS(acq)
acqTagSplit = strsplit(acqTag," ")
acqTagSplit
qq = 0
tag = 0
for (i in 1:length(acqTagSplit[[1]])){
qq[i] <-strsplit(acqTagSplit[[1]][i],'/')
tag[i] = qq[i][[1]][2]
}
index = 0
k = 0
for (i in 1:(length(acqTagSplit[[1]])-1)) {
if ((tag[i] == "NN" && tag[i+1] == "NN") | (tag[i] == "NNS" && tag[i+1] == "NNS") | (tag[i] == "NNS" && tag[i+1] == "NN") | (tag[i] == "NN" && tag[i+1] == "NNS") | (tag[i] == "JJ" && tag[i+1] == "NN") | (tag[i] == "JJ" && tag[i+1] == "NNS")){
k = k +1
index[k] = i
}
}
index
Reader can refer index on acqTagSplit to do noun+noun or (adj|noun)+noun extractation. (The code is not optimum but work. If you have any idea, please let me know.)
Furthermore, I still have a problem.
Justeson and Katz (1995) proposed another linguistic filtering to extract candidate noun phrases:
((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun
I cannot well understand its meaning, could someone do me a favor to explain it or transform such representation into R language
© Stack Overflow or respective owner