How to lazy process an xml documentwith hexpat?

Posted by Florian on Stack Overflow See other posts from Stack Overflow or by Florian
Published on 2012-03-19T21:14:26Z Indexed on 2012/03/20 5:29 UTC
Read the original article Hit count: 196

Filed under:
|
|

In my search for a haskell library that can process large (300-1000mb) xml files i came across hexpat. There is an example in the Haskell Wiki that claims to

-- Process document before handling error, so we get lazy processing.

For testing purposes i have redirected the output to /dev/null and throw a 300mb file at it. Memory consumption kept rising until i had to kill the process.

Now i removed the error handling from the process function:

process :: String -> IO ()
process filename = do
  inputText <- L.readFile filename
  let (xml, mErr) = parse defaultParseOptions inputText :: (UNode String,     Maybe XMLParseError)

  hFile <- openFile "/dev/null" WriteMode
  L.hPutStr hFile $ format xml
  hClose hFile

  return ()

As a result the function now uses constant memory. Why does the error handling result in massive memory consumption?

As far as i understand xml and mErr are two seperate unevaluated thunks after the call to parse. Does format xml evaluate xml and build the evaluation tree of 'mErr'? If yes is there a way to handle the error while using constant memory?

http://www.haskell.org/haskellwiki/Hexpat/

© Stack Overflow or respective owner

Related posts about Xml

Related posts about haskell