parsec manyTill stack overflow

Badea Daniel

4 Jul 2008 4 Jul '08

8:31 p.m.

I'm trying to parse a large file looking for instructions on each line and for a section end marker but Parsec's manyTill function causes stack overflow, as you can see in the following example (I'm using ghci 6.8.3):

...

parse (many anyChar) "" ['a'|x<-[1..1024*64]]

It almost immediately starts printing "aaaaaaaaaaa...." and runs to completion.

...

parse (manyTill anyChar eof) "" ['a'|x<-[1..1024*1024]] *** Exception: stack overflow

I guess this happens because manyTill recursively accumulates output from the first parser and returns only when it hits the 'end' parser. Is it possible to write a version of 'manyTill' that works like 'many' returning output from 'anyChar' as soon as it advances through the list of tokens? Thanks, Daniel

Show replies by date

Felipe Lessa

4 Jul 4 Jul

9:11 p.m.

On Fri, Jul 4, 2008 at 5:31 PM, Badea Daniel wrote:

...

...
parse (manyTill anyChar eof) "" ['a'|x<-[1..1024*1024]] *** Exception: stack overflow

The usual solution applies: move the data from the stack to the heap. Try using manyTill' act end = go [] where go acc = choice [end >> return (reverse acc) ,act >>= \x -> go (x:acc)] -- Felipe.

Derek Elkins

9:22 p.m.

On Fri, 2008-07-04 at 13:31 -0700, Badea Daniel wrote:

...

I'm trying to parse a large file looking for instructions on each line and for a section end marker but Parsec's manyTill function causes stack overflow, as you can see in the following example (I'm using ghci 6.8.3):

...
parse (many anyChar) "" ['a'|x<-[1..1024*64]]

It almost immediately starts printing "aaaaaaaaaaa...." and runs to completion.

...
parse (manyTill anyChar eof) "" ['a'|x<-[1..1024*1024]] *** Exception: stack overflow

I guess this happens because manyTill recursively accumulates output from the first parser and returns only when it hits the 'end' parser. Is it possible to write a version of 'manyTill' that works like 'many' returning output from 'anyChar' as soon as it advances through the list of tokens?

No, manyTill doesn't know whether it is going to return anything at all until its second argument succeeds. I can make manyTill not stack overflow, but it will never immediately start returning results. For the particular case above you can use getInput and setInput to get a result that does what you want. parseRest = do rest <- getInput setInput [] return rest That should probably update the position as well though it's not so crucial in the likely use-cases of such a function.

Badea Daniel

10:15 p.m.

The file I'm trying to parse contains mixed sections like: ... ... so I defined two parsers: one for the 'outer' language and the other one for the 'inner' language. I used (manyTill inner_parser end_section_parser) but I got a stack overflow because there's just too much text between section begin and end. With getInput I can switch from the outer parser to the inner parser but this one tries to parse until eof and when it hits the '/end_section>' it fails. --- On Fri, 7/4/08, Derek Elkins wrote:

...

...
I'm trying to parse a large file looking for instructions on each line and for a section end marker but Parsec's manyTill function causes stack overflow, as you can see in the following example (I'm using ghci 6.8.3):

...
parse (many anyChar) "" ['a'|x<-[1..1024*64]]

It almost immediately starts printing "aaaaaaaaaaa...." and runs to completion.

...
parse (manyTill anyChar eof) "" ['a'|x<-[1..1024*1024]] *** Exception: stack overflow

I guess this happens because manyTill recursively accumulates output from the first parser and returns only when it hits

From: Derek Elkins Subject: Re: [Haskell-cafe] parsec manyTill stack overflow To: "Badea Daniel" Cc: haskell-cafe@haskell.org Date: Friday, July 4, 2008, 2:22 PM On Fri, 2008-07-04 at 13:31 -0700, Badea Daniel wrote: the 'end' parser.

...
Is it possible to write a version of 'manyTill' that works like 'many' returning output from 'anyChar' as soon as it advances through the list of tokens?

No, manyTill doesn't know whether it is going to return anything at all until its second argument succeeds. I can make manyTill not stack overflow, but it will never immediately start returning results. For the particular case above you can use getInput and setInput to get a result that does what you want.

parseRest = do rest <- getInput setInput [] return rest

That should probably update the position as well though it's not so crucial in the likely use-cases of such a function.

Jonathan Cast

10:29 p.m.

On Fri, 2008-07-04 at 15:15 -0700, Badea Daniel wrote:

...

The file I'm trying to parse contains mixed sections like:

...

... script including arithmetic expressions ...

/end_section>

...

so I defined two parsers: one for the 'outer' language and the other one for the 'inner' language. I used (manyTill inner_parser end_section_parser)

Does inner_parser (or a parser it calls) recognize `/end_section'? If not, I don't think you actually need manyTill. If so, that's more difficult. Two thoughts: * This design looks vaguely XML-ish; is it possible to use a two-stage parser, recognizing but not parsing the arithmetic expressions and then looping back over the parse tree later? * If the part of inner_parser that would recognize /end_section (presumably as a division operator followed by an identifier?) is well isolated, you could locally exclude it there; e.g., instead of divison_operator = operator "/" say division_operatory = try $ do satisfy (=='/') notFollowedBy (string "end_section") whitespace (Or reverse the order or notFollowedBy and whitespace). jcc

Badea Daniel

10:44 p.m.

I'm using makeTokenParser and buildExpressionParser for the inner_parser. Thanks for your thoughts, I'll use a two-stage parser that looks for /end_section> and stores tokens in heap and then getInput/setInput to feed the inner_parser. --- On Fri, 7/4/08, Jonathan Cast wrote:

...

...
The file I'm trying to parse contains mixed

From: Jonathan Cast Subject: Re: [Haskell-cafe] parsec manyTill stack overflow To: badeadaniel@yahoo.com Cc: "Derek Elkins" , haskell-cafe@haskell.org Date: Friday, July 4, 2008, 3:29 PM On Fri, 2008-07-04 at 15:15 -0700, Badea Daniel wrote: sections like:

...
...

... script including arithmetic expressions ...

/end_section>

...

so I defined two parsers: one for the 'outer'

language and

...
the other one for the 'inner' language. I used (manyTill inner_parser end_section_parser)

Does inner_parser (or a parser it calls) recognize `/end_section'? If not, I don't think you actually need manyTill. If so, that's more difficult. Two thoughts:

* This design looks vaguely XML-ish; is it possible to use a two-stage parser, recognizing but not parsing the arithmetic expressions and then looping back over the parse tree later?

* If the part of inner_parser that would recognize /end_section (presumably as a division operator followed by an identifier?) is well isolated, you could locally exclude it there; e.g., instead of

divison_operator = operator "/"

say

division_operatory = try $ do satisfy (=='/') notFollowedBy (string "end_section") whitespace

(Or reverse the order or notFollowedBy and whitespace).

jcc

6217

Age (days ago)

6217

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Badea Daniel
Derek Elkins
Felipe Lessa
Jonathan Cast