stanford pos tagger runs out of memory?
- by goh
my stanford tagger ran out of memory. Is it because the text has to be properly formatted? This is because i use it to tag html contents, with the tags stripped, but there may have quite a excessive amount of newlines.
here is the error:
BlockquoWARNING: Untokenizable: ? (char in decimal: 9829)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(Ex
actBestSequenceFinder.java:175)
at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(Exact
BestSequenceFinder.java:98)
at edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSente
nce.java:277)
at edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSent
ence.java:258)
at edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence.
java:110)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger.
java:825)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja
va:1319)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja
va:1225)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja
va:1183)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:13
58)