Process xml-like log file queue

Posted by Zsolt Botykai on Stack Overflow See other posts from Stack Overflow or by Zsolt Botykai
Published on 2010-05-04T08:44:08Z Indexed on 2010/05/04 8:48 UTC
Read the original article Hit count: 244

Hi all,

first of all: I'm not a programmer, never was, although had learn a lot during my professional carreer as a support consultant.

Now my task is to process - and create some statistics about a constantly written and rapidly growing XML like log file. It's not valid XML, because it does not have a proper <root> element, e.g. the log looks like this:

<log itemdate="somedate">
  <field id="0" />
  ...
</log>

<log itemdate="somedate+1">
  <field id="0" />
  ...
</log>

<log itemdate="somedate+n">
  <field id="0" />
  ...
</log>

E.g. I have to count all the items with field id=0. But most of the solutions I had found (e.g. using XPath) reports an error about the garbage after the first closing </log>.

Most probably I can use python (2.6, although I can compile 3.x as well), or some really old perl version (5.6.x), and recently compiled xmlstarlet which really looks promising - I was able to create the statistics for a certain period after copying the file, and pre- & appending the opening and closing root element. But this is a huge file and copying takes time as well. Isn't there a better solution?

Thanks in advance!

© Stack Overflow or respective owner

Related posts about Xml

Related posts about processing