How to lazy process an xml documentwith hexpat?
- by Florian
In my search for a haskell library that can process large (300-1000mb) xml files i came across hexpat. There is an example in the Haskell Wiki that claims to
-- Process document before handling error, so we get lazy processing.
For testing purposes i have redirected the output to /dev/null and throw a 300mb file at it. Memory consumption kept rising until i had to kill the process.
Now i removed the error handling from the process function:
process :: String -> IO ()
process filename = do
inputText <- L.readFile filename
let (xml, mErr) = parse defaultParseOptions inputText :: (UNode String, Maybe XMLParseError)
hFile <- openFile "/dev/null" WriteMode
L.hPutStr hFile $ format xml
hClose hFile
return ()
As a result the function now uses constant memory. Why does the error handling result in massive memory consumption?
As far as i understand xml and mErr are two seperate unevaluated thunks after the call to parse. Does format xml evaluate xml and build the evaluation tree of 'mErr'? If yes is there a way to handle the error while using constant memory?
http://www.haskell.org/haskellwiki/Hexpat/