stanford pos tagger runs out of memory?

Posted by goh on Stack Overflow See other posts from Stack Overflow or by goh
Published on 2010-06-11T09:26:36Z Indexed on 2010/06/11 9:33 UTC
Read the original article Hit count: 659

Filed under:
|

my stanford tagger ran out of memory. Is it because the text has to be properly formatted? This is because i use it to tag html contents, with the tags stripped, but there may have quite a excessive amount of newlines.

here is the error:

BlockquoWARNING: Untokenizable: ? (char in decimal: 9829) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(Ex actBestSequenceFinder.java:175) at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(Exact BestSequenceFinder.java:98) at edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSente nce.java:277) at edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSent ence.java:258) at edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence. java:110) at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger. java:825) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1319) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1225) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1183) at edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:13 58)

© Stack Overflow or respective owner

Related posts about python

Related posts about stanford-nlp