Parsing Data in XML and Storing to DB in Python

Posted by Rakesh on Stack Overflow See other posts from Stack Overflow or by Rakesh
Published on 2011-01-09T10:30:56Z Indexed on 2011/01/09 10:53 UTC
Read the original article Hit count: 229

Filed under:
|
|

Hi Guys i have problem parsing an xml file and entering the data to sqlite, the format is like i need to enter the chracters before the token like 111,AAA,BBB etc

<DOCUMENT>
<PAGE width="544.252" height="634.961" number="1" id="p1">
<MEDIABOX x1="0" y1="0" x2="544.252" y2="634.961"/>

<BLOCK id="p1_b1">

<TEXT width="37.7" height="74.124" id="p1_t1" x="51.1" y="20.8652">
<TOKEN sid="p1_s11" id="p1_w1" font-name="Verdanae" bold="yes" italic="no">111</TOKEN>
</TEXT>
</BLOCK>

<BLOCK id="p1_b3">

<TEXT width="151.267" height="10.725" id="p1_t6" x="24.099" y="572.096">
<TOKEN sid="p1_s35" id="p1_w22" font-name="Verdanae" bold="yes"     italic="yes">AAA</TOKEN>
<TOKEN sid="p1_s36" id="p1_w23" font-name="verdanae" bold="yes" italic="no">BBB</TOKEN>
<TOKEN sid="p1_s37" id="p1_w24" font-name="verdanae" bold="yes" italic="no">CCC</TOKEN>
</TEXT>
</BLOCK>

<BLOCK id="p1_b4">

<TEXT width="82.72" height="26" id="p1_t7" x="55.426" y="138.026">
<TOKEN sid="p1_s42" id="p1_w29" font-name="verdanae" bold="yes" italic="no">DDD</TOKEN>
<TOKEN sid="p1_s43" id="p1_w30" font-name="verdanae" bold="yes" italic="no">EEE</TOKEN>
</TEXT>

<TEXT width="101.74" height="26" id="p1_t8" x="55.406" y="162.026">
<TOKEN sid="p1_s45" id="p1_w31" font-name="verdanae" bold="yes" italic="no">FFF</TOKEN>
</TEXT>

<TEXT width="152.96" height="26" id="p1_t9" x="55.406" y="186.026">
<TOKEN sid="p1_s47" id="p1_w32" font-name="verdanae" bold="yes" italic="no">GGG</TOKEN>
<TOKEN sid="p1_s48" id="p1_w33" font-name="verdanae" bold="yes" italic="no">HHH</TOKEN>
</TEXT>
</BLOCK>
</PAGE>
</DOCUMENT>

in .net it is done with 3 foreach loops 1. for "DOCUMENT/PAGE/BLOCK" 2."TEXT" 3. "TOKEN" and then it is entered into the DB i dont get how to do it in python and i am trying it with lxml module

© Stack Overflow or respective owner

Related posts about python

Related posts about Xml