Regex to add CDATA for mal formed XML
Posted
by AntonioCS
on Stack Overflow
See other posts from Stack Overflow
or by AntonioCS
Published on 2010-06-01T17:03:01Z
Indexed on
2010/06/01
17:13 UTC
Read the original article
Hit count: 330
Hey guys!
I have this huge xml file (13 mb) and it has some malformed values. Here is a sample of the xml:
<propertylist>
<adprop index="0" proptype="type" value="Ft"/>
<adprop index="0" proptype="category" value="Bs"/>
<adprop index="0" proptype="subcategory" value="Bsm"/>
<adprop index="0" proptype="description" value="MOONEN CUSTOM 58"/>
</propertylist>
Now this is ok. But I many other nodes that are not encapsulated in CDATA that need to be. The node that gives me problems is the
<adprop index="0" proptype="description" value=""/>
I created this regular expression:
<adprop index="0" proptype="description" value="(.+)"\/>
to catch that node and replace it with this:
<adprop index="0" proptype="description" value="<![CDATA[\1]]>"\/>
I run this in notepad++ and it works.
The only problem is when the value="" is multi lined like:
<adprop index="0" proptype="description" value="cutter that has demonstrated her offshore capabiliti from there to the Canaries with her current owner.
Spacious homely interior with over 2m headroom and heaps of" />
It fails with this one, and there are plenty like this one.
Can anyone help me out in the regular expression so that I can catch the value when it's multi lined?
Thanks
© Stack Overflow or respective owner