How to extract block of XML from a log file on Linux
Posted
by dragonmantank
on Stack Overflow
See other posts from Stack Overflow
or by dragonmantank
Published on 2010-05-14T01:55:59Z
Indexed on
2010/05/14
2:04 UTC
Read the original article
Hit count: 293
I have a log file that looks like the following:
2010-05-12 12:23:45 Some sort of log entry
2010-05-12 01:45:12 Request XML: <RootTag>
<Element>Value</Element>
<Element>Another Value</Element>
</RootTag>
2010-05-12 01:45:32 Response XML: <ResponseRoot>
<Element>Value</Element>
</ResponseRoot>
2010-05-12 01:45:49 Another log entry
What I want to do is extract the Request and Response XML (and ultimately dump them into their own single files). I had a similar parser that used egrep but the XML was all on one line, not multiple ones like above.
The log files are also somewhat large, hitting 500-600 megs a log. Smaller logs I would read in via a PHP script and use regex matching, but the amount of memory required for such a large file would more than likely kill the script.
Is there an easy way using the built-in tools on a Linux box (CentOS in this case) to extract multiple lines or am I going to have to bite the bullet and use Perl or PHP to read in the entire file to extract it?
© Stack Overflow or respective owner