
Am 29.09.2010 20:01, schrieb Daniel Fischer:
On Wednesday 29 September 2010 19:10:22, Ben Franksen wrote:
Note the last line mentions only '}'. I would rather like to see
expecting "}" or digit
since the parser could very well accept another digit here.
parsec2 did that, I don't know whether that change is intentional or accidental.
Right, parsec2 or parsec-2.1.0.1 still does so. (parsec-3 behaves differently wrt error messages.) Try "ghc-pkg hide parsec" so that parsec-2.1.0.1 will be taken: import Text.ParserCombinators.Parsec import Control.Monad infixl 1 << (<<) :: Monad m => m a -> m b -> m a (<<) = liftM2 const block p = char '{' >> p << char '}' parser = block (many (digit)) main = parseTest parser "{123a}" *Main> main Loading package parsec-2.1.0.1 ... linking ... done. parse error at (line 1, column 5): unexpected "a" expecting digit or "}"
(1) What is the reason for this behaviour? (2) Is there another combinator that behaves as I would like? (3) Otherwise, how do I write one myself?
ask derek.a.elkins@gmail.com (CCed) Cheers Christian
I just saw that Christian Maeder answered a similar question recently. I
tried his suggestion of using manyTill and bingo:
{-# LANGUAGE NoMonomorphismRestriction #-} import Control.Applicative ((*>),(<*)) import Text.Parsec block p = char '{' *> p <* char '}' parser = block (manyTill digit (char '}')) main = parseTest parser "{123a}"
gives
parse error at (line 1, column 5): unexpected "a" expecting "}" or digit
So far so good. I wonder whether this parser is as efficient as the original one.
manyTill p end = scan where scan = do{ end; return [] } <|> do{ x <- p; xs <- scan; return (x:xs) }
I'm not sure, but I suspect it's less efficient.
Perhaps
manyTill' p end = scan [] where scan acc = do { end; return (reverse acc) } <|> do { x <- p; scan (x:acc) }
is more efficient (depends on Parsec's bind which is more efficient), you could test.
Also, this style is less modular, as I have to mention the terminator in two places.
That's not the main problem. `manyTill' consumes the ending token, so
block (manyTill whatever (char '}')) needs two '}' to succeed. You would need
block (manyTill digit (lookAhead (char '}'))
to replicate the behaviour of block (many digit).
Is there a non-greedy variant of 'many' so that modularity gets restored and efficiency is not lost?
Cheers Ben