Persisting NLP parsed data
- by tjb1982
I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application?
One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this:
Component (
id,
POS,
parent_id
)
Word (
id,
raw,
lemma,
POS,
NER
)
CW_Map (
component_id,
word_id,
position int
)
But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?