
I have a question about Parsec. The following program
import Control.Applicative ((*>),(<*)) import Text.Parsec import Text.Parsec.Char block p = char '{' *> p <* char '}' parser = block (many digit) main = parseTest parser "{123a}"
gives the output parse error at (line 1, column 5): unexpected "a" expecting "}" Note the last line mentions only '}'. I would rather like to see expecting "}" or digit since the parser could very well accept another digit here. (1) What is the reason for this behaviour? (2) Is there another combinator that behaves as I would like? (3) Otherwise, how do I write one myself? BTW, I am using parsec-3.1.0 and ghc-6.12.3. Cheers Ben

Ben Franksen wrote:
import Control.Applicative ((*>),(<*)) import Text.Parsec import Text.Parsec.Char block p = char '{' *> p <* char '}' parser = block (many digit) main = parseTest parser "{123a}"
gives the output
parse error at (line 1, column 5): unexpected "a" expecting "}"
Note the last line mentions only '}'. I would rather like to see
expecting "}" or digit
since the parser could very well accept another digit here.
(1) What is the reason for this behaviour? (2) Is there another combinator that behaves as I would like? (3) Otherwise, how do I write one myself?
I just saw that Christian Maeder answered a similar question recently. I tried his suggestion of using manyTill and bingo:
{-# LANGUAGE NoMonomorphismRestriction #-} import Control.Applicative ((*>),(<*)) import Text.Parsec block p = char '{' *> p <* char '}' parser = block (manyTill digit (char '}')) main = parseTest parser "{123a}"
gives parse error at (line 1, column 5): unexpected "a" expecting "}" or digit So far so good. I wonder whether this parser is as efficient as the original one. Also, this style is less modular, as I have to mention the terminator in two places. Is there a non-greedy variant of 'many' so that modularity gets restored and efficiency is not lost? Cheers Ben

On Wednesday 29 September 2010 19:10:22, Ben Franksen wrote:
Note the last line mentions only '}'. I would rather like to see
expecting "}" or digit
since the parser could very well accept another digit here.
parsec2 did that, I don't know whether that change is intentional or accidental.
(1) What is the reason for this behaviour? (2) Is there another combinator that behaves as I would like? (3) Otherwise, how do I write one myself?
I just saw that Christian Maeder answered a similar question recently. I
tried his suggestion of using manyTill and bingo:
{-# LANGUAGE NoMonomorphismRestriction #-} import Control.Applicative ((*>),(<*)) import Text.Parsec block p = char '{' *> p <* char '}' parser = block (manyTill digit (char '}')) main = parseTest parser "{123a}"
gives
parse error at (line 1, column 5): unexpected "a" expecting "}" or digit
So far so good. I wonder whether this parser is as efficient as the original one.
manyTill p end = scan where scan = do{ end; return [] } <|> do{ x <- p; xs <- scan; return (x:xs) } I'm not sure, but I suspect it's less efficient. Perhaps manyTill' p end = scan [] where scan acc = do { end; return (reverse acc) } <|> do { x <- p; scan (x:acc) } is more efficient (depends on Parsec's bind which is more efficient), you could test.
Also, this style is less modular, as I have to mention the terminator in two places.
That's not the main problem. `manyTill' consumes the ending token, so block (manyTill whatever (char '}')) needs two '}' to succeed. You would need block (manyTill digit (lookAhead (char '}')) to replicate the behaviour of block (many digit).
Is there a non-greedy variant of 'many' so that modularity gets restored and efficiency is not lost?
Cheers Ben

Daniel Fischer wrote:
On Wednesday 29 September 2010 19:10:22, Ben Franksen wrote:
Note the last line mentions only '}'. I would rather like to see
expecting "}" or digit
since the parser could very well accept another digit here.
parsec2 did that, I don't know whether that change is intentional or accidental.
This looks more like a bug than a feature to me. I checked parsec-3.0.1 and it behaves like parsec-2, i.e. behaves as I expected.
(1) What is the reason for this behaviour? (2) Is there another combinator that behaves as I would like? (3) Otherwise, how do I write one myself?
I just saw that Christian Maeder answered a similar question recently. I
tried his suggestion of using manyTill and bingo:
{-# LANGUAGE NoMonomorphismRestriction #-} import Control.Applicative ((*>),(<*)) import Text.Parsec block p = char '{' *> p <* char '}' parser = block (manyTill digit (char '}')) main = parseTest parser "{123a}" You would need
block (manyTill digit (lookAhead (char '}'))
to replicate the behaviour of block (many digit).
Right, so it gets even more complicated.
Is there a non-greedy variant of 'many' so that modularity gets restored and efficiency is not lost?
So many questions... Cheers Ben

Am 29.09.2010 20:01, schrieb Daniel Fischer:
On Wednesday 29 September 2010 19:10:22, Ben Franksen wrote:
Note the last line mentions only '}'. I would rather like to see
expecting "}" or digit
since the parser could very well accept another digit here.
parsec2 did that, I don't know whether that change is intentional or accidental.
Right, parsec2 or parsec-2.1.0.1 still does so. (parsec-3 behaves differently wrt error messages.) Try "ghc-pkg hide parsec" so that parsec-2.1.0.1 will be taken: import Text.ParserCombinators.Parsec import Control.Monad infixl 1 << (<<) :: Monad m => m a -> m b -> m a (<<) = liftM2 const block p = char '{' >> p << char '}' parser = block (many (digit)) main = parseTest parser "{123a}" *Main> main Loading package parsec-2.1.0.1 ... linking ... done. parse error at (line 1, column 5): unexpected "a" expecting digit or "}"
(1) What is the reason for this behaviour? (2) Is there another combinator that behaves as I would like? (3) Otherwise, how do I write one myself?
ask derek.a.elkins@gmail.com (CCed) Cheers Christian
I just saw that Christian Maeder answered a similar question recently. I
tried his suggestion of using manyTill and bingo:
{-# LANGUAGE NoMonomorphismRestriction #-} import Control.Applicative ((*>),(<*)) import Text.Parsec block p = char '{' *> p <* char '}' parser = block (manyTill digit (char '}')) main = parseTest parser "{123a}"
gives
parse error at (line 1, column 5): unexpected "a" expecting "}" or digit
So far so good. I wonder whether this parser is as efficient as the original one.
manyTill p end = scan where scan = do{ end; return [] } <|> do{ x <- p; xs <- scan; return (x:xs) }
I'm not sure, but I suspect it's less efficient.
Perhaps
manyTill' p end = scan [] where scan acc = do { end; return (reverse acc) } <|> do { x <- p; scan (x:acc) }
is more efficient (depends on Parsec's bind which is more efficient), you could test.
Also, this style is less modular, as I have to mention the terminator in two places.
That's not the main problem. `manyTill' consumes the ending token, so
block (manyTill whatever (char '}')) needs two '}' to succeed. You would need
block (manyTill digit (lookAhead (char '}'))
to replicate the behaviour of block (many digit).
Is there a non-greedy variant of 'many' so that modularity gets restored and efficiency is not lost?
Cheers Ben

Christian Maeder wrote:
Am 29.09.2010 20:01, schrieb Daniel Fischer:
On Wednesday 29 September 2010 19:10:22, Ben Franksen wrote:
Note the last line mentions only '}'. I would rather like to see
expecting "}" or digit
since the parser could very well accept another digit here.
parsec2 did that, I don't know whether that change is intentional or accidental.
Right, parsec2 or parsec-2.1.0.1 still does so. (parsec-3 behaves differently wrt error messages.)
Try "ghc-pkg hide parsec" so that parsec-2.1.0.1 will be taken:
I need parsec-3 since I use it as a monad transformer over IO so I can do IO during parsing. And I want efficiency, too, so did not consider parsec-3.0.*. Cheers Ben

On Wed, Sep 29, 2010 at 1:01 PM, Daniel Fischer
On Wednesday 29 September 2010 19:10:22, Ben Franksen wrote:
Note the last line mentions only '}'. I would rather like to see
expecting "}" or digit
since the parser could very well accept another digit here.
parsec2 did that, I don't know whether that change is intentional or accidental.
I came up with a smaller example which shows a similar problem:
bracket = char '{' *> return () test = bracket *> (bracket <|> return ()) *> char 'a'
For the input "{b" the error message should mention that we can take either a '{' or an 'a', however it only mentions the 'a'. However I know how to fix this one, and it doesn't fix the bug evidenced by the above program. Antoine

On Sun, Oct 3, 2010 at 11:55 AM, Ben Franksen
Stephen Tetley wrote:
Does this one give the "expected" error message for Parsec3.1 - unfortunately I can't test as I'm still using Parsec 2.1.0.1.
parser = block (many digit > "digit")
Unfortunately, no.
Cheers Ben
Hey folks, sorry about this one - my changes to parsec in 3.1 made these error messages worse. I've sent a patch off to the maintainer which fixes the examples in this thread. Antoine

Antoine Latter wrote:
On Sun, Oct 3, 2010 at 11:55 AM, Ben Franksen
wrote: Stephen Tetley wrote:
Does this one give the "expected" error message for Parsec3.1 - unfortunately I can't test as I'm still using Parsec 2.1.0.1.
parser = block (many digit > "digit")
Unfortunately, no.
Hey folks, sorry about this one - my changes to parsec in 3.1 made these error messages worse. I've sent a patch off to the maintainer which fixes the examples in this thread.
Thanks! I hope we get a new minor release with these fixes soon. I love parsec-3 very much, especially since you fixed the speed problems. Cheers Ben
participants (5)
-
Antoine Latter
-
Ben Franksen
-
Christian Maeder
-
Daniel Fischer
-
Stephen Tetley