How can I create the XML::Simple data structure using a Perl XML SAX parser?

Posted by DVK on Stack Overflow See other posts from Stack Overflow or by DVK
Published on 2010-05-26T11:29:05Z Indexed on 2010/05/27 4:21 UTC
Read the original article Hit count: 275

Filed under:
|
|
|
|

Summary: I am looking a fast XML parser (most likely a wrapper around some standard SAX parser) which will produce per-record data structure 100% identical to those produced by XML::Simple.

Details:

We have a large code infrastructure which depends on processing records one-by-one and expects the record to be a data structure in a format produced by XML::Simple since it always used XML::Simple since early Jurassic era.

An example simple XML is:

<root>
    <rec><f1>v1</f1><f2>v2</f2></rec>
    <rec><f1>v1b</f1><f2>v2b</f2></rec>
    <rec><f1>v1c</f1><f2>v2c</f2></rec>
</root>

And example rough code is:

sub process_record { my ($obj, $record_hash) = @_; # do_stuff }
my $records = XML::Simple->XMLin(@args)->{root};
foreach my $record (@$records) { $obj->process_record($record) };

As everyone knows XML::Simple is, well, simple. And more importantly, it is very slow and a memory hog—due to being a DOM parser and needing to build/store 100% of data in memory. So, it's not the best tool for parsing an XML file consisting of large amount of small records record-by-record.

However, re-writing the entire code (which consist of large amount of "process_record"-like methods) to work with standard SAX parser seems like an big task not worth the resources, even at the cost of living with XML::Simple.

I'm looking for an existing module which will probably be based on a SAX parser (or anything fast with small memory footprint) which can be used to produce $record hashrefs one by one based on the XML pictured above that can be passed to $obj->process_record($record) and be 100% identical to what XML::Simple's hashrefs would have been.

I don't care much what the interface of the new module is; e.g whether I need to call next_record() or give it a callback coderef accepting a record.

© Stack Overflow or respective owner

Related posts about Xml

Related posts about perl