converting a treebank of vertical trees to s-expressions
Posted
by Andreas
on Stack Overflow
See other posts from Stack Overflow
or by Andreas
Published on 2010-05-11T22:58:28Z
Indexed on
2010/05/11
23:04 UTC
Read the original article
Hit count: 266
I need to preprocess a treebank corpus of sentences with parse trees. The input format is a vertical representation of trees, like so:
S
=NP
==(DT +def) the
== (N +ani) man
=VP
==V walks
...and I need it like:
(S (NP (DT the) (N man)) (VP (V walks)))
I have code that almost does it, but not quite. There's always a missing paren somewhere. Should I use a proper parser, maybe a CFG? The current code is at http://github.com/andreasvc/eodop/blob/master/arbobanko.py
The code also contains real examples from the treebank.
© Stack Overflow or respective owner