
Try this simple program: import Text.ParserCombinators.Parsec ppAny = tokenPrim show (\p _ _ -> p) (\t -> Just t) ppTest = many ppAny p s = case runParser ppTest True "" s of Left error -> show error Right result -> result main = do let fname = "C:/main.i" f <- readFile fname let tokens = p f putStrLn . show $ head tokens This is the simplest expression of using Parsec, it just returns the input unchanged. But it already seems to have a big memory leak. When parsing a 2 MB file, I'm seeing the program grow in size up to about 160 MB before it ends. I assume this is known behavior. I believe it's because of the way Parsec deals with errors: when you chain two parsers in sequence, the result of the first one (in this case the head of the list) is not available until it knows if the second one failed or not. Is there any known alternative that doesn't exhibit this behavior? It would have to somehow return errors inline or on a "side channel". I'll be toying with this sort of thing for a while. Thanx! JCAB

big memory leak ... two parsers in sequence
A related question, are these two equivalents? foldl2 f g y z lst = (foldl f y lst, foldl g z lst) foldl2' f g y z lst = foldl (\(i, j) x -> (f i x, g j x)) (y, z) lst If no, is there some strictness that can be added to make them equivalent? And if so, could a compiler make this optimization as to avoid these sorts of space leaks? Thanks, Greg

At Wed, 17 May 2006 11:36:00 -0700, Juan Carlos Arevalo Baeza wrote:
Is there any known alternative that doesn't exhibit this behavior? It would have to somehow return errors inline or on a "side channel". I'll be toying with this sort of thing for a while.
You might try reading, Polish Parsers, Step By Step by R. John M. Hughes and S. Doaise Swierstra http://www.cs.uu.nl/docs/vakken/afp/Literature/p224-swierstra.pdf From the abstract: --- We present the derivation of a space efficient parser combin- ator library: the constructed parsers do not keep unneces- sary references to the input, produce online results and effi- ciently handle ambiguous grammars. The underlying tech- niques can be applied in many contexts where traditionally backtracking is used. --- By 'online' they mean able to starting producing results before the whole thing is parsed. j.

Very interesting, thanx. I haven't finished grokking this yet, but... Basically, this one is of the "somehow return errors inline" category. The errors are returned inline, together with the expression that computes the result, so that you can extract either the errors and/or the result whenever convenient. I'd be worried about the performance you can get from this breadth-first approach. I sort of like the fine-tuning control that the "try" approach gives in parsec. I'll finish the paper before giving this any more thought, though. JCAB Jeremy Shaw wrote:
At Wed, 17 May 2006 11:36:00 -0700, Juan Carlos Arevalo Baeza wrote:
Is there any known alternative that doesn't exhibit this behavior? It would have to somehow return errors inline or on a "side channel". I'll be toying with this sort of thing for a while.
You might try reading,
Polish Parsers, Step By Step by R. John M. Hughes and S. Doaise Swierstra
http://www.cs.uu.nl/docs/vakken/afp/Literature/p224-swierstra.pdf
From the abstract: --- We present the derivation of a space efficient parser combin- ator library: the constructed parsers do not keep unneces- sary references to the input, produce online results and effi- ciently handle ambiguous grammars. The underlying tech- niques can be applied in many contexts where traditionally backtracking is used. ---
By 'online' they mean able to starting producing results before the whole thing is parsed.
j. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

At Thu, 18 May 2006 02:37:56 -0700, Juan Carlos Arevalo Baeza wrote:
I'd be worried about the performance you can get from this breadth-first approach. I sort of like the fine-tuning control that the "try" approach gives in parsec. I'll finish the paper before giving this any more thought, though.
In section 7 they claim that their parser performs at "about half the speed of off-line generated parsers, while also preparing for and performing full error repair and error reporting." I am guessing that 'off-line generated parsers' means things like lex/yacc -- but I am not positive. j.
participants (4)
-
Christian Maeder
-
Greg Fitzgerald
-
Jeremy Shaw
-
Juan Carlos Arevalo Baeza