opennlp - Developer IT

Why the HelloWorld of opennlp library works fine on Java but doesn't work with Jruby?

- by 0x90

I am getting this error: SyntaxError: hello.rb:13: syntax error, unexpected tIDENTIFIER public HelloWorld( InputStream data ) throws IOException { The HelloWorld.rb is: require "java" import java.io.FileInputStream; import java.io.InputStream; import java.io.IOException; import opennlp.tools.postag.POSModel; import opennlp.tools.postag.POSTaggerME; public class HelloWorld { private POSModel model; public HelloWorld( InputStream data ) throws IOException { setModel( new POSModel( data ) ); } public void run( String sentence ) { POSTaggerME tagger = new POSTaggerME( getModel() ); String[] words = sentence.split( "\\s+" ); String[] tags = tagger.tag( words ); double[] probs = tagger.probs(); for( int i = 0; i < tags.length; i++ ) { System.out.println( words[i] + " => " + tags[i] + " @ " + probs[i] ); } } private void setModel( POSModel model ) { this.model = model; } private POSModel getModel() { return this.model; } public static void main( String args[] ) throws IOException { if( args.length < 2 ) { System.out.println( "HelloWord <file> \"sentence to tag\"" ); return; } InputStream is = new FileInputStream( args[0] ); HelloWorld hw = new HelloWorld( is ); is.close(); hw.run( args[1] ); } } when running ruby HelloWorld.rb "I am trying to make it work" when I run the HelloWorld.java "I am trying to make it work" it works perfectly, of course the .java doesn't contain the require java statement. EDIT: I followed the following steps. The output for jruby -v : jruby 1.6.7.2 (ruby-1.8.7-p357) (2012-05-01 26e08ba) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_35) [darwin-x86_64-java]

Read the article

Extracting ""((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun"" from Text (Justeson & Katz, 1995)

- by ssuhan

I would like to query if it is possible to extract ((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun proposed by Justeson and Katz (1995) in R package openNLP? That is, I would like to use this linguistic filtering to extract candidate noun phrases. I cannot well understand its meaning. Could you do me a favor to explain it or transform such representation into R language. Many thanks. Maybe we can start the sample code from: library("openNLP") acq <- "This paper describes a novel optical thread plug gauge (OTPG) for internal thread inspection using machine vision. The OTPG is composed of a rigid industrial endoscope, a charge-coupled device camera, and a two degree-of-freedom motion control unit. A sequence of partial wall images of an internal thread are retrieved and reconstructed into a 2D unwrapped image. Then, a digital image processing and classification procedure is used to normalize, segment, and determine the quality of the internal thread." acqTag <- tagPOS(acq) acqTagSplit = strsplit(acqTag," ")

Read the article

Is sticking to one language a good practice?

- by Ans

I'm developing a pipeline for processing text that will go into production. The question I keep asking myself is: should I stick to one language when looking for a tool to do a particular task (e.g. NLTK, PDFMiner, CLD, CRFsuite, etc.)? Or is it OK to mix and match looking for the best tool regardless of what language it's written in (e.g. OpenNLP, ParsCit, poppler, CFR++, etc.) and warp my code around them?

Read the article

Extracting noun+noun or (adj|noun)+noun from Text

- by ssuhan

I would like to query if it is possible to extract noun+noun or (adj|noun)+noun in R package openNLP?That is, I would like to use linguistic filtering to extract candidate noun phrases. Could you direct me how to do? Many thanks. Thanks for the responses. here is the code: library("openNLP") acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter." acqTag <- tagPOS(acq) acqTagSplit = strsplit(acqTag," ") acqTagSplit qq = 0 tag = 0 for (i in 1:length(acqTagSplit[[1]])){ qq[i] <-strsplit(acqTagSplit[[1]][i],'/') tag[i] = qq[i][[1]][2] } index = 0 k = 0 for (i in 1:(length(acqTagSplit[[1]])-1)) { if ((tag[i] == "NN" && tag[i+1] == "NN") | (tag[i] == "NNS" && tag[i+1] == "NNS") | (tag[i] == "NNS" && tag[i+1] == "NN") | (tag[i] == "NN" && tag[i+1] == "NNS") | (tag[i] == "JJ" && tag[i+1] == "NN") | (tag[i] == "JJ" && tag[i+1] == "NNS")){ k = k +1 index[k] = i } } index Reader can refer index on acqTagSplit to do noun+noun or (adj|noun)+noun extractation. (The code is not optimum but work. If you have any idea, please let me know.) Furthermore, I still have a problem. Justeson and Katz (1995) proposed another linguistic filtering to extract candidate noun phrases: ((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun I cannot well understand its meaning, could someone do me a favor to explain it or transform such representation into R language

Read the article

Is sticking to one language on a particular project a good practice?

- by Ans

I'm developing a pipeline for processing text that will go into production. The question I keep asking myself is: should I stick to one language for the project when I'm looking for a tool to do a particular task (e.g. NLTK, PDFMiner, CLD, CRFsuite, etc.)? Or is it OK to mix and match languages on the project? So I pick the best tool regardless of what language it's written in (e.g. OpenNLP, ParsCit, poppler, CFR++, etc.) and warp (wrap) my code around it? Note, I am not asking about should a developer stick to just one language for their career.

Read the article

Amazon Product API ResponseGroups and Default results

- by aboxy

A. In our application, most of the data we work with is stored as free text .i.e. there is no categorization done as of now. We are using openNLP libraries to make sense of the data(extract keywords/classify) and do a query to Amazon web services to pull the results of the query. We use searchindex=All and keywords=. Results are not always returned and we basically get 'AWS.ECommerceService.NoExactMatches' How to avoid that? 1) Is there a way to specify default results if no match found? e.g. Amazon carousel widget does that if the search query did not return results, it basically show some computer items. 2) Should I batch the request always and add another search criteria to every request? If my first criteria does not pull any results, we can be sure that our 2nd query will always pull results(possibly caching?) Here is one search criteria 'Open Circle Hoop Earrings Polished Stainless Steel Open Circle Hoop Earrings Polished Stainless Steel DiamondShark' This return no results via API. On Amazon site,I get alternative suggestions with some results which are pretty relevant. Is there a way to pull those results? B. We just need a thumbnail image and a title and description for our app. Which responseGroup is appropriate? We are using medium rt now but there is awful lot of information even with that responseGroup. Any help is appreciated. thanks

Read the article

Searching Natural Language Sentence Structure

- by Cerin

What's the best way to store and search a database of natural language sentence structure trees? Using OpenNLP's English Treebank Parser, I can get fairly reliable sentence structure parsings for arbitrary sentences. What I'd like to do is create a tool that can extract all the doc strings from my source code, generate these trees for all sentences in the doc strings, store these trees and their associated function name in a database, and then allow a user to search the database using natural language queries. So, given the sentence "This uploads files to a remote machine." for the function upload_files(), I'd have the tree: (TOP (S (NP (DT This)) (VP (VBZ uploads) (NP (NNS files)) (PP (TO to) (NP (DT a) (JJ remote) (NN machine)))) (. .))) If someone entered the query "How can I upload files?", equating to the tree: (TOP (SBARQ (WHADVP (WRB How)) (SQ (MD can) (NP (PRP I)) (VP (VB upload) (NP (NNS files)))) (. ?))) how would I store and query these trees in a SQL database? I've written a simple proof-of-concept script that can perform this search using a mix of regular expressions and network graph parsing, but I'm not sure how I'd implement this in a scalable way. And yes, I realize my example would be trivial to retrieve using a simple keyword search. The idea I'm trying to test is how I might take advantage of grammatical structure, so I can weed-out entries with similar keywords, but a different sentence structure. For example, with the above query, I wouldn't want to retrieve the entry associated with the sentence "Checks a remote machine to find a user that uploads files." which has similar keywords, but is obviously describing a completely different behavior.

Developer IT