Extracting noun+noun or (adj|noun)+noun from Text
- by ssuhan
I would like to query if it is possible to extract noun+noun or (adj|noun)+noun in R package openNLP?That is, I would like to use linguistic filtering to extract candidate noun phrases. Could you direct me how to do?
Many thanks.
Thanks for the responses.
here is the code:
library("openNLP")
acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."
acqTag <- tagPOS(acq)
acqTagSplit = strsplit(acqTag," ")
acqTagSplit
qq = 0
tag = 0
for (i in 1:length(acqTagSplit[[1]])){
qq[i] <-strsplit(acqTagSplit[[1]][i],'/')
tag[i] = qq[i][[1]][2]
}
index = 0
k = 0
for (i in 1:(length(acqTagSplit[[1]])-1)) {
if ((tag[i] == "NN" && tag[i+1] == "NN") | (tag[i] == "NNS" && tag[i+1] == "NNS") | (tag[i] == "NNS" && tag[i+1] == "NN") | (tag[i] == "NN" && tag[i+1] == "NNS") | (tag[i] == "JJ" && tag[i+1] == "NN") | (tag[i] == "JJ" && tag[i+1] == "NNS")){
k = k +1
index[k] = i
}
}
index
Reader can refer index on acqTagSplit to do noun+noun or (adj|noun)+noun extractation.
(The code is not optimum but work. If you have any idea, please let me know.)
Furthermore, I still have a problem.
Justeson and Katz (1995) proposed another linguistic filtering to extract candidate noun phrases:
((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun
I cannot well understand its meaning, could someone do me a favor to explain it or transform such representation into R language