Persisting NLP parsed data

Posted by tjb1982 on Programmers See other posts from Programmers or by tjb1982
Published on 2012-10-17T20:59:44Z Indexed on 2012/10/17 23:19 UTC
Read the original article Hit count: 382

Filed under:
|
|
|

I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application?

One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this:

Component (
  id,
  POS,
  parent_id
)

Word (
  id,
  raw,
  lemma,
  POS,
  NER
)

CW_Map (
  component_id,
  word_id,
  position int
)

But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

© Programmers or respective owner

Related posts about database

Related posts about parsing