
Stephane Bortzmeyer wrote:
I'm trying to use Parsec for a language which have identifiers where the '-' character is allowed only inside identifiers, not at the start or the end.
ParsecToken has identStart to tell that the '-' is not allowed at the start but I find no equivalent identEnd?
I have not used ParsecToken
I tried also to express the same rule with ordinary combinators, without ParsecToken but this fails:
identifier = do start <- letter rest <- many (alphaNum <|> char '-') end <- letter return ([start] ++ rest ++ [end]) > "characters authorized for identifiers"
because the parser created by "many" is greedy: it consumes everything, including the final letter.
Any idea?
The hard thing about using Parsec is to know how to combine <|> with 'try'. Fixing this may be as simple as
identifier = try $ do start <- letter rest <- many (alphaNum <|> char '-') end <- letter return ([start] ++ rest ++ [end]) > "characters authorized for identifiers"
Alternatively, if the first character being a letter commits you to an identifier or a syntax error, then you could move the try after the first letter has been read and committed to:
identifier = do start <- letter try $ do rest <- many (alphaNum <|> char '-') end <- letter return (start:(rest ++ [end])) > "characters authorized for identifiers"
(Both untested) And can the last letter be an alphaNum instead of only a letter? You can also make the test more explicit:
import Data.Char; import Control.Monad;
identifier = try $ do start <- letter rest <- many (satisfy (\c -> alphaNum c || (c=='-'))) when (not (null rest) && '-' == last rest) (unexpected "Identifier cannot end in -") return (start:rest) or identifier = do start <- letter <?> "Identifiers must start with a letter" try $ do rest <- many (satisfy (\c -> alphaNum c || (c=='-'))) <?> "valid identifier character" when (not (null rest) && '-' == last rest) (unexpected "identifier cannot end in -") return (start:rest)