How to lazy process an xml documentwith hexpat?
Posted
by
Florian
on Stack Overflow
See other posts from Stack Overflow
or by Florian
Published on 2012-03-19T21:14:26Z
Indexed on
2012/03/20
5:29 UTC
Read the original article
Hit count: 196
In my search for a haskell library that can process large (300-1000mb) xml files i came across hexpat. There is an example in the Haskell Wiki that claims to
-- Process document before handling error, so we get lazy processing.
For testing purposes i have redirected the output to /dev/null
and throw a 300mb file at it. Memory consumption kept rising until i had to kill the process.
Now i removed the error handling from the process
function:
process :: String -> IO ()
process filename = do
inputText <- L.readFile filename
let (xml, mErr) = parse defaultParseOptions inputText :: (UNode String, Maybe XMLParseError)
hFile <- openFile "/dev/null" WriteMode
L.hPutStr hFile $ format xml
hClose hFile
return ()
As a result the function now uses constant memory. Why does the error handling result in massive memory consumption?
As far as i understand xml
and mErr
are two seperate unevaluated thunks after the call to parse
. Does format xml
evaluate xml
and build the evaluation tree of 'mErr'? If yes is there a way to handle the error while using constant memory?
© Stack Overflow or respective owner