Search Results

Search found 234 results on 10 pages for 'stanford nlp'.

Page 1/10 | 1 2 3 4 5 6 7 8 9 10  | Next Page >

  • Persisting NLP parsed data

    - by tjb1982
    I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this: Component ( id, POS, parent_id ) Word ( id, raw, lemma, POS, NER ) CW_Map ( component_id, word_id, position int ) But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

    Read the article

  • stanford pos tagger runs out of memory?

    - by goh
    my stanford tagger ran out of memory. Is it because the text has to be properly formatted? This is because i use it to tag html contents, with the tags stripped, but there may have quite a excessive amount of newlines. here is the error: BlockquoWARNING: Untokenizable: ? (char in decimal: 9829) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(Ex actBestSequenceFinder.java:175) at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(Exact BestSequenceFinder.java:98) at edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSente nce.java:277) at edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSent ence.java:258) at edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence. java:110) at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger. java:825) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1319) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1225) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1183) at edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:13 58)

    Read the article

  • using Dependency Parser in Stanford coreNLP

    - by Eddie Dovzhik
    I am using the Stanford coreNLP ( http://nlp.stanford.edu/software/corenlp.shtml ) in order to parse sentences and extract dependencies between the words. I have managed to create the dependencies graph like in the example in the supplied link, but I don't know how to work with it. I can print the entire graph using the toString() method, but the problem I have is that the methods that search for certain words in the graph, such as getChildList, require an IndexedWord object as a parameter. Now, it is clear why they do because the nodes of the graph are of IndexedWord type, but it's not clear to me how I create such an object in order to search for a specific node. For example: I want to find the children of the node that represents the word "problem" in my sentence. How I create an IndexWord object that represents the word "problem" so I can search for it in the graph?

    Read the article

  • calling Stanford POS Tagger maxentTagger from java program

    - by Akansha
    Hi. I am new to Stanford POS tagger. I need to call the Tagger from my jva program and direct the output to a text file. I have extracted the source files from Stanford-postagger and tried calling the maxentTagger, but all I find is errors and warnings. Can somebody tell me from the scratch about how to call maxentTagger in my program, setting the classpath if required and other such steps. Please help me out.

    Read the article

  • Starting out NLP - Python + large data set

    - by pencilNero
    Hi, I've been wanting to learn python and do some NLP, so have finally gotten round to starting. Downloaded the english wikipedia mirror for a nice chunky dataset to start on, and have been playing around a bit, at this stage just getting some of it into a sqlite db (havent worked with dbs in the past unfort). But I'm guessing sqlite is not the way to go for a full blown nlp project(/experiment :) - what would be the sort of things I should look at ? HBase (.. and hadoop) seem interesting, i guess i could run then im java, prototype in python and maybe migrate the really slow bits to java... alternatively just run Mysql.. but the dataset is 12gb, i wonder if that will be a problem? Also looked at lucene, but not sure how (other than breaking the wiki articles into chunks) i'd get that to work.. What comes to mind for a really flexible NLP platform (i dont really know at this stage WHAT i want to do.. just want to learn large scale lang analysis tbh) ? Many thanks.

    Read the article

  • NLP Library in java

    - by user337962
    hi, I need a simple Natural Language Processing library written in java which can be used to process a search query/question. What I want actually is to separate the main subject which is being searched in a query. For an example, considering a query like "What is an apple?", it's perfect if the main search word apple can be extracted. This is for a semantic search engine development purpose. Can anyone please suggest a suitable nlp library for this?? Thank You!!

    Read the article

  • NLP with greatly contrained input and abilities

    - by Mike F
    Hat in hand here. I'm a seasoned developer and I would be grateful for a bit of help. I don't have time to read or digest long intricate discussions on theoretical concepts around NLP (or go get my PHD). That said, I have read a few and it's a damn interesting field. The problem is I need real world solutions, for real world products, in real world time frames. The problem I'm having is right now I'm not sure what the right questions are to ask to get started implementing. I believe this is mostly related to vocabulary. I'll read somewhere, a blog post, a forum post, a whitepaper, and it says, I'm doing flooping with the blargy blarg method, and I go google flooping and blargy blarg, and I get references to more obscurity. It seemingly never ends. So, my question is multiphased. First, more generally, how do I become passingly educated on this quickly? Just in time educated. I only need to know what I need to know to take the next step. I've spent 20 years writing code. Explain quick. I'll get it. (I mean provide a reference to something that explains quickly of course). I'm happy to read the right book, but I don't want to read a book where I read the chapter introduction that explains what floopy floop is and then skip over the rest of the chapter with examples of floopy flooping (because now I get what it is). I also don't want to read a book that goes into too much detail with theoretical underpinnings or history. For example, the Jurafsky book seems like way more than I need: http://www.amazon.com/gp/product/0131873210. But I will read it if this is the right book to read. (It's also dang expensive!) I need the root node of the expedited learning tree here, if you will. Point me in the right direction and I'll be quite grateful. I'm expecting quite a lot of firehose drinking - I just need the right firehose. Second, what I need to do is take a single sentence, with a very reduced vocabulary, and get a grammar tree (sorry if this is the wrong terminology) that I can do something with. I know I could easily write this command line input style in c in a more conventional manner, but I need it to be way better than that. But I don't need a chatterbot either. What I'm doing needs to live in a constrained environment. I can't use Python (unfortunately). I can't ship with gigabytes of corpuses. I need any libraries I use to be in c/c++. If I have to write this myself, I will. Hopefully, it will be achievable considering the reduced problem set. Maybe, probably, that's just naive. If so, let me know. :-) Thanks in advance - Mike

    Read the article

  • NLP - Queries using semantic wildcards in full text searching, maybe with Lucene?

    - by Zsolt
    Let's say I have a big corpus (for example in english or an arbitrary language), and I want to perform some semantic search on it. For example I have the query: "Be careful: [art] armada of [sg] is coming to [do sg]!" And the corpus contains the following sentence: "Be careful: an armada of alien ships is coming to destroy our planet!" It can be seen that my query string could contain "semantic placeholders", such as: [art] - some placeholder for articles (for example a / an in English) [sg], [do sg] - some placeholders for NPs and VPs (subjects and predicates) I would like to develop a library which would be capable to handle these queries efficiently. I suspect that some kind of POS-tagging would be necessary for parsing the text, but because I don't want to fully reimplement an already existing full-text search engine to make it work, I'm considering that how could I integrate this behaviour into a search engine like Lucene? I know there are SpanQueries which could behave similarly in some cases, but as I can see, Lucene doesn't do any semantic stuff with stored texts. It is possible to implement a behavior like this? Or do I have to write an own search engine?

    Read the article

  • NLP - Word Alignment

    - by mgj
    Hi..:) I am looking for word alignment tools and algorithms, I am dealing with bilingual English - Hindi text, Currently I am working on DTW(Dynamic Time Warping) algorithm, CLA(Competitive Linking Algorithm) , NATool, Giza++. Could you please suggest me any other alogrithm/tool which is language independent which could achieve Statistical word alignment for parallel English Hindi Corpora and its Evaluation, some tools languages are best for certain languages.. Could one please tell me how true is that and if so could you please give me an example what would suite better for Asian languages like Hindi and what shouldn't I use for such languages. I have heard a bit about uplug word aligner.. could one tell me if I could use it as a tool for my purpose. Thank you.. :)

    Read the article

  • how to get wanted nodes from Stanford Parser nlp

    - by vitaly
    Hello all! My mean problem is that I dont know how to extract nodes from GrammaticalStructure. I am using englishPCFG.ser in java netbeans. My target is o know the quality of the screen like: the screen of iphone 4 is great. I want to extract screen and great. how can i extract the NN (screen) and VP (great) the code that I wrote is: LexicalizedParser lp = new LexicalizedParser("C:\\englishPCFG.ser"); lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"}); String sent ="the screen is very good."; Tree parse = (Tree) lp.apply(Arrays.asList(sent)); parse.pennPrint(); System.out.println(); TreebankLanguagePack tlp = new PennTreebankLanguagePack(); GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(); GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); Collection tdl = gs.typedDependenciesCollapsed();

    Read the article

  • How to get parent node in Stanford's JavaNLP?

    - by roddik
    Hello. Suppose I have such chunk of a sentence: (NP (NP (DT A) (JJ single) (NN page)) (PP (IN in) (NP (DT a) (NN wiki) (NN website)))) At a certain moment of time I have a reference to (JJ single) and I want to get the NP node binding A single page. If I get it right, that NP is the parent of the node, A and page are its siblings and it has no children (?). When I try to use the .parent() method of a tree, I always get null. The API says that's because the implementation doesn't know how to determine the parent node. Another method of interest is .ancestor(int height, Tree root), but I don't know how to get the root of the node. In both cases, since the parser knows how to indent and group trees, it must know the "parent" tree, right? How can I get it? Thanks

    Read the article

  • Why is Prolog associated with Natural Language Processing?

    - by kyphos
    I have recently started learning about NLP with python, and NLP seems to be based mostly on statistics/machine-learning. What does a logic programming language bring to the table with respect to NLP? Is the declarative nature of prolog used to define grammars? Is it used to define associations between words? That is, somehow mine logical relationships between words (this I imagine would be pretty hard to do)? Any examples of what prolog uniquely brings to NLP would be highly appreciated.

    Read the article

  • Seeking some advice on pursuing MS in CS from Stanford or Carnegie Mellon or Caltech

    - by avi
    What kinds of projects are given preference in top notch colleges like Stanford, Caltech, etc to get admission into MS programme in Computer Science? I have an average academic portfolio. I'm pursuing Btech from a not so popular university in India with an aggregate of 67%. I'm good at designing algorithms and possess good knowledge of core subjects but helpless with my percentage. So, I think the only way I can impress them is with my project(s). Can anyone please suggest me the kinds of projects that are given preference by such top level institutes? Could you please also suggest some good projects? My area of interest would be Artificial Intelligence or any application/software/algorithm design which could be of some help to common people. Or if you have any other random idea for my project then please share it with me. Note: Web based projects and management projects like lib management wouldn't be my priority.

    Read the article

  • I have a list of names, some of them are fake, I need to use NLP and Python 3.1 to keep the real nam

    - by Sho Minamimoto
    I have no clue of where to start on this. I've never done any NLP and only programmed in a Python 3.1, which I have to use. I'm looking at the site http://www.linkedin.com and I have to gather all of the public profiles and some of them have very fake names, like 'aaaaaa k dudujjek' and I've been told I can use NLP to find the real names, where would I even start?

    Read the article

  • Persisting natural language processing parsed data

    - by tjb1982
    I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (Postgres supports this and I've found it works really well). But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

    Read the article

  • Simple NLP: How to use ngram to do word similarity?

    - by sadawd
    Dear Everyone, I Hear that google uses up to 7-grams for their own data. I am interested in finding words that are similar in context (i.e. cat and dog) and I was wondering how do I compute the similarity of two words on a n-gram model given that n 2. Given a sample set like this forexample: (I, love cats), (cats, loves, dogs), (dogs, hate, human) What is a good way to compare the similarity of this pair (I, cats)? Also does anyone know of anyway to do levels for NLP? like: Army-Military-Solider ?

    Read the article

  • NLP: any easy and good methods to find semantic similarity between words?

    - by sadawd
    Dear Everyone, I don't know whether stackoverflow covers NLP, so I am gonna give this a shot. I am interested to find the semantic relatedness of two words from a specific domain, i.e. "image quality" and "noise". I am doing some research to determine if reviews of a cameras are positive or negative for a particular attribute of the camera. (like image quality in each one of the reviews). However, not everybody uses the exact same wording "image quality" in the posts, so I am out to see if there is a way for me to build something like that: "image quality" which includes ("noise", "color", "sharpness", etc etc) so I can wrap all everything within one big umbrella. I am doing this for another language, so Wordnet is not necessarily helpful. And no, I do now work for Google or Microsoft so I do not have data from people's clicking behavior as input data either. However, I do have a lot of text, pos-tagged, segmented etc. Thanks

    Read the article

  • Which programming language could I use for Natural Language Processing to extract clinical words?

    - by MACEE
    I am going to do entity extraction (like Named Entity Recognition) from clinical free text (unstructured raw text such as discharge summaries) and these entities could be any medical problem, medical tests or treatments. I am going to use one of i2b2 datasets (https://www.i2b2.org/) if case you are familiar with that. I am new to the NLP(Natural Language Processing) field and I need a programming language to support NLP tasks and also easily connect to the available libraries of machine learning algorithms like CRF. I don't know much java and I heard about Python, Perl and Scala but I am not sure which one would be the best option for this task?

    Read the article

  • How to sift idioms and set phrases apart from other common phrases using NLP techniques?

    - by hippietrail
    What techniques exist that can tell the difference betwen plain common phrases such as "to the", "and the" and set phrases and idioms which have their own lexical meanings such as "pick up", "fall in love", "red herring", "dead end"? Are there techniques which are successful even without a dictionary, statistical methods HMMs train on large corpora for instance? Or are there heuristics such as ignoring or weighting down "promiscuous" words which can co-occur with just about any word versus words which occur either alone or in a specific limited set of idiomatic phrases? If there are such heuristics, how do we take into account set phrases and verbal phrases which do incorporate promiscuous words such as "up" in "beat up", "eat up", "sit up", "think up"? UPDATE I've found an interesting paper online: Unsupervised Type and Token Identi?cation of Idiomatic Expressions

    Read the article

  • Construct sentences from tabular data

    - by Sumeet
    I have a huge set of html files an have to retrieve the meaningful information from them. Most of the task is accomplished, now the problem is with HTML tables. I have some literature on how to extract meaningful tables from html, but my problem is with creating meaningful sentences from tabular data (or attribute value pairs extracted from a table). Are there any NLP/Machine learning techniques to do this? Here is what I expect. Suppose below is a sample table: col_Name: Sumeet col_year: 2011 col_winner: quiz Can this be made to something meaningful like "Sumeet won quiz in 2011"?

    Read the article

  • How to create a Semantic Network like wordnet based on Wikipedia?

    - by Forbidden Overseer
    I am an undergraduate student and I have to create a Semantic Network based on Wikipedia. This Semantic Network would be similar to Wordnet(except for it is based on Wikipedia and is concerned with "streams of text/topics" rather than simple words etc.) and I am thinking of using the Wikipedia XML dumps for the purpose. I guess I need to learn parsing an XML and "some other things" related to NLP and probably Machine Learning, but I am no way sure about anything involved herein after the XML parsing. Is the starting step: XML dump parsing into text a good idea/step? Any alternatives? What would be the steps involved after parsing XML into text to create a functional Semantic Network? What are the things/concepts I should learn in order to do them? I am not directly asking for book recommendations, but if you have read a book/article that teaches any thing related/helpful, please mention them. This may include a refernce to already existing implementations regarding the subject. Please correct me if I was wrong somewhere. Thanks!

    Read the article

  • natural language processing internships

    - by user552127
    Hi All, Pls someone guide me in finding paid Grad internships in Natural Language Processing over the summer. I am really interested in NLP/ML and have taken up the excellent course offered at my school in Fall. I would be glad to work for passionate startups that do actual NLP tasks such as semantic extraction (and not just information retrieval) etc. I have worked with Java and teaching myself Python in all NLP tasks. Thanks, Sanjay

    Read the article

1 2 3 4 5 6 7 8 9 10  | Next Page >