
I'm trying to parse a large file looking for instructions on each line and for a section end marker but Parsec's manyTill function causes stack overflow, as you can see in the following example (I'm using ghci 6.8.3):
parse (many anyChar) "" ['a'|x<-[1..1024*64]]
It almost immediately starts printing "aaaaaaaaaaa...." and runs to completion.
parse (manyTill anyChar eof) "" ['a'|x<-[1..1024*1024]] *** Exception: stack overflow
I guess this happens because manyTill recursively accumulates output from the first parser and returns only when it hits the 'end' parser. Is it possible to write a version of 'manyTill' that works like 'many' returning output from 'anyChar' as soon as it advances through the list of tokens? Thanks, Daniel

On Fri, Jul 4, 2008 at 5:31 PM, Badea Daniel
parse (manyTill anyChar eof) "" ['a'|x<-[1..1024*1024]] *** Exception: stack overflow
The usual solution applies: move the data from the stack to the heap. Try using manyTill' act end = go [] where go acc = choice [end >> return (reverse acc) ,act >>= \x -> go (x:acc)] -- Felipe.

On Fri, 2008-07-04 at 13:31 -0700, Badea Daniel wrote:
I'm trying to parse a large file looking for instructions on each line and for a section end marker but Parsec's manyTill function causes stack overflow, as you can see in the following example (I'm using ghci 6.8.3):
parse (many anyChar) "" ['a'|x<-[1..1024*64]]
It almost immediately starts printing "aaaaaaaaaaa...." and runs to completion.
parse (manyTill anyChar eof) "" ['a'|x<-[1..1024*1024]] *** Exception: stack overflow
I guess this happens because manyTill recursively accumulates output from the first parser and returns only when it hits the 'end' parser. Is it possible to write a version of 'manyTill' that works like 'many' returning output from 'anyChar' as soon as it advances through the list of tokens?
No, manyTill doesn't know whether it is going to return anything at all until its second argument succeeds. I can make manyTill not stack overflow, but it will never immediately start returning results. For the particular case above you can use getInput and setInput to get a result that does what you want. parseRest = do rest <- getInput setInput [] return rest That should probably update the position as well though it's not so crucial in the likely use-cases of such a function.

The file I'm trying to parse contains mixed sections like:
...
I'm trying to parse a large file looking for instructions on each line and for a section end marker but Parsec's manyTill function causes stack overflow, as you can see in the following example (I'm using ghci 6.8.3):
parse (many anyChar) "" ['a'|x<-[1..1024*64]]
It almost immediately starts printing "aaaaaaaaaaa...." and runs to completion.
parse (manyTill anyChar eof) "" ['a'|x<-[1..1024*1024]] *** Exception: stack overflow
I guess this happens because manyTill recursively accumulates output from the first parser and returns only when it hits
From: Derek Elkins
Subject: Re: [Haskell-cafe] parsec manyTill stack overflow To: "Badea Daniel" Cc: haskell-cafe@haskell.org Date: Friday, July 4, 2008, 2:22 PM On Fri, 2008-07-04 at 13:31 -0700, Badea Daniel wrote: the 'end' parser. Is it possible to write a version of 'manyTill' that works like 'many' returning output from 'anyChar' as soon as it advances through the list of tokens?
No, manyTill doesn't know whether it is going to return anything at all until its second argument succeeds. I can make manyTill not stack overflow, but it will never immediately start returning results. For the particular case above you can use getInput and setInput to get a result that does what you want.
parseRest = do rest <- getInput setInput [] return rest
That should probably update the position as well though it's not so crucial in the likely use-cases of such a function.

On Fri, 2008-07-04 at 15:15 -0700, Badea Daniel wrote:
The file I'm trying to parse contains mixed sections like:
...
... script including arithmetic expressions ...
/end_section>
...
so I defined two parsers: one for the 'outer' language and the other one for the 'inner' language. I used (manyTill inner_parser end_section_parser)
Does inner_parser (or a parser it calls) recognize `/end_section'? If not, I don't think you actually need manyTill. If so, that's more difficult. Two thoughts: * This design looks vaguely XML-ish; is it possible to use a two-stage parser, recognizing but not parsing the arithmetic expressions and then looping back over the parse tree later? * If the part of inner_parser that would recognize /end_section (presumably as a division operator followed by an identifier?) is well isolated, you could locally exclude it there; e.g., instead of divison_operator = operator "/" say division_operatory = try $ do satisfy (=='/') notFollowedBy (string "end_section") whitespace (Or reverse the order or notFollowedBy and whitespace). jcc

I'm using makeTokenParser and buildExpressionParser for
the inner_parser. Thanks for your thoughts, I'll use a
two-stage parser that looks for /end_section> and stores
tokens in heap and then getInput/setInput to feed the
inner_parser.
--- On Fri, 7/4/08, Jonathan Cast
The file I'm trying to parse contains mixed
From: Jonathan Cast
Subject: Re: [Haskell-cafe] parsec manyTill stack overflow To: badeadaniel@yahoo.com Cc: "Derek Elkins" , haskell-cafe@haskell.org Date: Friday, July 4, 2008, 3:29 PM On Fri, 2008-07-04 at 15:15 -0700, Badea Daniel wrote: sections like: ...
... script including arithmetic expressions ...
/end_section>
...
so I defined two parsers: one for the 'outer'
language and
the other one for the 'inner' language. I used (manyTill inner_parser end_section_parser)
Does inner_parser (or a parser it calls) recognize `/end_section'? If not, I don't think you actually need manyTill. If so, that's more difficult. Two thoughts:
* This design looks vaguely XML-ish; is it possible to use a two-stage parser, recognizing but not parsing the arithmetic expressions and then looping back over the parse tree later?
* If the part of inner_parser that would recognize /end_section (presumably as a division operator followed by an identifier?) is well isolated, you could locally exclude it there; e.g., instead of
divison_operator = operator "/"
say
division_operatory = try $ do satisfy (=='/') notFollowedBy (string "end_section") whitespace
(Or reverse the order or notFollowedBy and whitespace).
jcc
participants (4)
-
Badea Daniel
-
Derek Elkins
-
Felipe Lessa
-
Jonathan Cast