Parsec lookahead and <|>

Goedemiddag café, Consider the following function, using parsec-3.0.0:
la :: Parsec String () (Maybe Char) la = lookAhead (optionMaybe anyChar)
*Lookahead> parseTest (char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (char 'a' <|> char 'b') "b" 'b' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "b" parse error at (line 1, column 2): unexpected "b" expecting "a" The first three work fine and as expected, but the fourth example fails where I would expect success. I know <|> won't try the rhs if the lhs consumed input, but lookAhead's documentation promises not to consume any input. Is this a bug in Parsec or am I missing something? Thanks, Martijn.

Yeah, that's weird. I played around with la and it seems to only cause problems when the parser passed into lookAhead succeeds, which seem to go directly against it's stated purpose. lookAhead isn't consuming, (hence the unexpected "b") but still prevents <|> from doing it's thing. Seems like a bug to me... My off the hip fix is a modified form of the ugly try: lookAhead (ParsecT p) = ParsecT $ \s@(State _ pos _) -> do res <- p s case res of Consumed rep -> do r <- rep case r of Error err -> return $ Empty $ return $ Error (setErrorPos pos err) Ok a state err -> return $ Empty $ return $ Ok a s err empty -> return $ empty The only potential annoyance with this fix that I can see, is that the error messages can be confusing if you are doing dumb things with your lookAhead parsers. For example: la :: Parsec String () (Char) la = lookAhead' (char 'r') *Main> parseTest ((la >> char 'a') <|> char 'b') "a" parse error at (line 1, column 1): unexpected "a" expecting "r" or "b" *Main> parseTest ((la >> char 'a') <|> char 'b') "r" parse error at (line 1, column 2): unexpected "r" expecting "a" or "b" But for the most part it behaves as expected. - Job (sorry for the double post Martijn, forgot to reply to all) On Thu, Aug 20, 2009 at 7:44 AM, Martijn van Steenbergen < martijn@van.steenbergen.nl> wrote:
Goedemiddag café,
Consider the following function, using parsec-3.0.0:
la :: Parsec String () (Maybe Char)
la = lookAhead (optionMaybe anyChar)
*Lookahead> parseTest (char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (char 'a' <|> char 'b') "b" 'b' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "b" parse error at (line 1, column 2): unexpected "b" expecting "a"
The first three work fine and as expected, but the fourth example fails where I would expect success. I know <|> won't try the rhs if the lhs consumed input, but lookAhead's documentation promises not to consume any input. Is this a bug in Parsec or am I missing something?
Thanks,
Martijn. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am Donnerstag 20 August 2009 13:44:15 schrieb Martijn van Steenbergen:
Goedemiddag café,
Consider the following function, using parsec-3.0.0:
la :: Parsec String () (Maybe Char) la = lookAhead (optionMaybe anyChar)
*Lookahead> parseTest (char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (char 'a' <|> char 'b') "b" 'b' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "b" parse error at (line 1, column 2): unexpected "b" expecting "a"
The first three work fine and as expected, but the fourth example fails where I would expect success. I know <|> won't try the rhs if the lhs consumed input, but lookAhead's documentation promises not to consume any input. Is this a bug in Parsec or am I missing something?
Bad bug in Parsec (from the beginning, the same happens in parsec-2), I'd say. Desugared, we have lookAhead p = getParserState >>= \st -> p >>= \r -> setParserState st >>= \_ -> return r Due to the (>>=), whenever p consumes input, lookAhead will return (Consumed _) and there's no way to get rid of it, so (la *> char 'a') returns Consumed (Error something) on the input "b" and (<|>) doesn't try char 'b'. The code for lookAhead should look something like (parsec-2, to avoid 'returns' cluttering the code): lookAhead p = Parser $ \st -> case parserReply $ runP p st of Ok x s err -> Empty (Ok x st err) Error err -> Empty (Error err) Since exporting an 'unconsume' function wouldn't be desirable, lookAhead would have to move to Text.Parse(rCombinators.Parse)c.Prim. (not necessary in parsec-3 yet, since that exports all top level definitions from all modules so far).
Thanks,
Martijn.

Daniel Fischer wrote:
Am Donnerstag 20 August 2009 13:44:15 schrieb Martijn van Steenbergen:
Goedemiddag café,
la :: Parsec String () (Maybe Char) la = lookAhead (optionMaybe anyChar) *Lookahead> parseTest (char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (char 'a' <|> char 'b') "b" 'b' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "b"
Consider the following function, using parsec-3.0.0: parse error at (line 1, column 2): unexpected "b" expecting "a"
The first three work fine and as expected, but the fourth example fails where I would expect success. I know <|> won't try the rhs if the lhs consumed input, but lookAhead's documentation promises not to consume any input. Is this a bug in Parsec or am I missing something?
Bad bug in Parsec (from the beginning, the same happens in parsec-2), I'd say.
I'd say, its a feature. lookAhead returns whatever its argument returns. So in this case it returns "Consumed" without consuming. You can always wrap around a "try" to force the alternative: parseTest (try (la >> char 'a') <|> char 'b') "b" Cheers Christian Maybe it should have been: la >> (char 'a' <|> char 'b') in the first place.

try works in this case, but it won't if we are using a parser which can
consume and then fail (instead of char 'a'). In which case we may want it to
fail without exploring the second option.
Hmmm though you might be right. Having lookAhead return Consumed is only a
problem if the parser passed to lookAhead succeeds, but the parser following
lookAhead fails without consuming, which seems like a fairly rare case.
Although, it would be a problem for cases where the lookAhead is checking
for a negation. For example:
parseStuff = (lookAhead parseNotCapital >> identifier) <|> number
wouldn't work if lookAhead returned Consumed on success, and try doesn't
save us either.
Even if returning "Consumed" is the desired behavior I'd still say it at
least deserves a note in the docs.
Martijn, how did you encounter this problem?
- Job
On Thu, Aug 20, 2009 at 2:21 PM, Christian Maeder
Daniel Fischer wrote:
Am Donnerstag 20 August 2009 13:44:15 schrieb Martijn van Steenbergen:
Goedemiddag café,
la :: Parsec String () (Maybe Char) la = lookAhead (optionMaybe anyChar) *Lookahead> parseTest (char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (char 'a' <|> char 'b') "b" 'b' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "a" 'a' *Lookahead> parseTest (la *> char 'a' <|> char 'b') "b"
Consider the following function, using parsec-3.0.0: parse error at (line 1, column 2): unexpected "b" expecting "a"
The first three work fine and as expected, but the fourth example fails where I would expect success. I know <|> won't try the rhs if the lhs consumed input, but lookAhead's documentation promises not to consume any input. Is this a bug in Parsec or am I missing something?
Bad bug in Parsec (from the beginning, the same happens in parsec-2), I'd say.
I'd say, its a feature. lookAhead returns whatever its argument returns. So in this case it returns "Consumed" without consuming.
You can always wrap around a "try" to force the alternative:
parseTest (try (la >> char 'a') <|> char 'b') "b"
Cheers Christian
Maybe it should have been:
la >> (char 'a' <|> char 'b')
in the first place. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thanks for your replies. Job Vranish wrote:
Martijn, how did you encounter this problem?
My list of input symbols contains some extra information and I was using lookahead to query for that information before doing the actual parsing. I was using it in various places, including a list of choices separated by <|>. I was assuming it wouldn't affect the choice but it turned out that Parsec was always going for the first one because I used a lookahead (see la) that always succeeds. Martijn.
participants (4)
-
Christian Maeder
-
Daniel Fischer
-
Job Vranish
-
Martijn van Steenbergen