
I'm trying to use Parsec for a language which have identifiers where the '-' character is allowed only inside identifiers, not at the start or the end. ParsecToken has identStart to tell that the '-' is not allowed at the start but I find no equivalent identEnd? I tried also to express the same rule with ordinary combinators, without ParsecToken but this fails: identifier = do start <- letter rest <- many (alphaNum <|> char '-') end <- letter return ([start] ++ rest ++ [end]) > "characters authorized for identifiers" because the parser created by "many" is greedy: it consumes everything, including the final letter. Any idea?

Stephane Bortzmeyer
identifier = do start <- letter rest <- many (alphaNum <|> char '-') end <- letter return ([start] ++ rest ++ [end]) > "characters authorized for identifiers"
because the parser created by "many" is greedy: it consumes everything, including the final letter.
How about eating chunks of alphaNum, then chunks of '-', in alternation. You just need to flatten the returned list of words to a single word. identifier = do init <- many alphaNum rest <- many ( do dash <- many1 (char '-') alfa <- many1 alphaNum return (dash++alfa) ) return (concat (init:rest)) Regards, Malcolm

Stephane Bortzmeyer wrote:
I'm trying to use Parsec for a language which have identifiers where the '-' character is allowed only inside identifiers, not at the start or the end.
ParsecToken has identStart to tell that the '-' is not allowed at the start but I find no equivalent identEnd?
I have not used ParsecToken
I tried also to express the same rule with ordinary combinators, without ParsecToken but this fails:
identifier = do start <- letter rest <- many (alphaNum <|> char '-') end <- letter return ([start] ++ rest ++ [end]) > "characters authorized for identifiers"
because the parser created by "many" is greedy: it consumes everything, including the final letter.
Any idea?
The hard thing about using Parsec is to know how to combine <|> with 'try'. Fixing this may be as simple as
identifier = try $ do start <- letter rest <- many (alphaNum <|> char '-') end <- letter return ([start] ++ rest ++ [end]) > "characters authorized for identifiers"
Alternatively, if the first character being a letter commits you to an identifier or a syntax error, then you could move the try after the first letter has been read and committed to:
identifier = do start <- letter try $ do rest <- many (alphaNum <|> char '-') end <- letter return (start:(rest ++ [end])) > "characters authorized for identifiers"
(Both untested) And can the last letter be an alphaNum instead of only a letter? You can also make the test more explicit:
import Data.Char; import Control.Monad;
identifier = try $ do start <- letter rest <- many (satisfy (\c -> alphaNum c || (c=='-'))) when (not (null rest) && '-' == last rest) (unexpected "Identifier cannot end in -") return (start:rest) or identifier = do start <- letter <?> "Identifiers must start with a letter" try $ do rest <- many (satisfy (\c -> alphaNum c || (c=='-'))) <?> "valid identifier character" when (not (null rest) && '-' == last rest) (unexpected "identifier cannot end in -") return (start:rest)

On Tue, Sep 05, 2006 at 03:46:16PM +0100,
Chris Kuklewicz
Fixing this may be as simple as
identifier = try $ do start <- letter rest <- many (alphaNum <|> char '-') end <- letter return ([start] ++ rest ++ [end]) > "characters authorized for identifiers"
It does not work for me (and neither does the second). The "try" argument always fails, probably because the term "many" on the "rest" line is greedy and swallows the ending letter.

Stephane Bortzmeyer wrote:
I'm trying to use Parsec for a language which have identifiers where the '-' character is allowed only inside identifiers, not at the start or the end.
identifier = do start <- letter rest <- many (alphaNum <|> char '-') end <- letter return ([start] ++ rest ++ [end]) > "characters authorized for identifiers"
identifier = do start <- letter rest <- many (alphaNum <|> try inner_minus) return $ start : rest where inner_minus = do char '-' lookAhead alphaNum return '-'
because the parser created by "many" is greedy: it consumes everything, including the final letter.
Yes, it does. You could implement you own non-greedy many combinator, but you get the associated inefficiency. Or you could use ReadP, which doesn't have this problem (but replaces it with other surprises). Udo. -- Eagles may soar but weasels don't get sucked into jet engines. -- Steven Wright

On Tue, Sep 05, 2006 at 04:17:41PM +0200,
Stephane Bortzmeyer
I'm trying to use Parsec for a language which have identifiers where the '-' character is allowed only inside identifiers, not at the start or the end.
[My grammar was underspecified, I also want to disallow two consecutive dashes.] Many thanks to Malcolm Wallace, Chris Kuklewicz and Udo Stenzel for their help and ideas. It seems there is no solution for ParsecToken (so I have to drop it). Among the two solutions which work for me (Malcolm Wallace's and Udo Stenzel's), I choosed the one by Udo because it is the one I understand the best. Here is my final version (rewritten in my style, errors are mine and not Udo's), thanks again: import Text.ParserCombinators.Parsec hiding (spaces) spaces = many1 (char ' ') inner_minus = do char '-' lookAhead alphaNum return '-' identifier = do start <- letter rest <- many (alphaNum <|> try inner_minus) return (start:rest) > "identifier" identifiers = do result <- identifier `sepBy` spaces eof return result main = do -- Legal parseTest identifiers "foo bar" parseTest identifiers "foo-bar baz go-to" parseTest identifiers "a b3 c56 e56-y7 gag-3456" -- Illegal parseTest identifiers "1llegal" parseTest identifiers "illegal- more" parseTest identifiers "ill--egal more" parseTest identifiers "illegal -more"

Stephane Bortzmeyer
On Tue, Sep 05, 2006 at 04:17:41PM +0200, Stephane Bortzmeyer
wrote a message of 25 lines which said: I'm trying to use Parsec for a language which have identifiers where the '-' character is allowed only inside identifiers, not at the start or the end.
I'm not really familiar with Parsec (I wrote my own limited backtrack parser years ago, and haven't quite got round to updating my brain), and while (judging by threads like this one) it seems to be harder to use than one would hope, this particular problem doesn't look as hard to me as all that.
[My grammar was underspecified, I also want to disallow two consecutive dashes.]
[...]
Here is my final version (rewritten in my style, errors are mine and not Udo's), thanks again:
inner_minus = do char '-' lookAhead alphaNum return '-'
identifier = do start <- letter rest <- many (alphaNum <|> try inner_minus) return (start:rest) > "identifier"
I'd have thought something like the following was the 'obvious' way of doing it: chThen c r = do a <- c; as <- r; return (a:as) identifier = do start <- letter `chThen` many alphaNum; rest <- many (char '-' `chThen` many1 alphaNum) return (start++concat rest) > "identifier" ie, your identifiers are an initial sequence of non-minuses beginning with a letter, and then an optional sequence of non-minuses preceded by a minus. Or have I lost the plot somewhere? Aside: Is there already name for `chThen`? ie (liftM2 (:)); I had a feeling we were avoiding liftM & friends for some reason. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk http://www.chaos.org.uk/~jf/Stuff-I-dont-want.html (updated 2006-07-14)
participants (5)
-
Chris Kuklewicz
-
Jón Fairbairn
-
Malcolm Wallace
-
Stephane Bortzmeyer
-
Udo Stenzel