
Shin-Cheng Mu wrote:
Dear members,
I am experiencing a space leak, which I suspect to be an instance of the problem addressed by Wadler before. I'd appreciate if someone here would take a look.
Given the following datatype:
data XMLEvent = StartEvent String | EndEvent String | TextEvent String deriving Show
where the three constructors represent the start tag (<a> where a is a string), end tag (</a>), and text data, respectively, of an XML stream. (They are actually from the library HXML). The following function simply returns the same stream while doing a minimal amount of validation (ignoring the closing tag).
idX :: [XMLEvent] -> ([XMLEvent], [XMLEvent]) idX [] = ([], []) idX (StartEvent a : strm) = let (ts, strm') = idX strm (us, strm'') = idX strm' in (StartEvent a [] : ts ++ EndEvent a : us, strm'') idX (EndEvent _: strm) = ([], strm) idX (TextEvent s : strm) = let (ts, strm') = idX strm in (TextEvent s : ts, strm')
The function idX returns a pair, where the first component is the processed stream, while the second component is the rest of the input. The intention is to thread the input and release processed events.
If the function is used in a pipelined manner:
print . fst . idX . parseInstance $ input
I would not have written idX in this manner. My version is
data XMLEvent = StartEvent String | EndEvent String | TextEvent String deriving Show
idX' :: [XMLEvent] -> [XMLEvent] idX' = untilTags []
untilTags :: [String] -> [XMLEvent] -> [XMLEvent] untilTags [] [] = [] untilTags tags [] = error ("Did not find closing EndEvents " ++ show tags) untilTags tags (x:xs) = case x of StartEvent tag' -> x : untilTags (tag':tags) xs EndEvent tag' -> if null tags then error ("Unexpected EndEvent with tag " ++ show tag') else let (tag:tags') = tags in if tag == tag' then x : untilTags tags' xs else error ("Expected EndEvents " ++ show tags ++ " but found EndEvent " ++ show tag') TextEvent _ -> x : untilTags tags xs
Here the flow of the input "x : ..." to the output "x : ..." is more obvious, and the open tags are kept in a stack and used to check the closing tags. The memory usage will be only slightly higher than your idX due to keeping the list of open tag names. If you want to remove that overhead and assume the ending closing tags have correct strings, then you can keep a mere count of open tags:
countTags :: Int -> [XMLEvent] -> [XMLEvent] countTags 0 [] = [] countTags n [] = error ("Did not find the closing EndEvents " ++ show n) countTags n (x:xs) = case x of StartEvent tag' -> x : countTags (succ n) xs EndEvent tag' -> if 0 == n then error ("Unexpected EndEvent with tag " ++ show tag') else x : countTags (pred n) xs TextEvent _ -> x : countTags n xs