
I'm writing a parser with Parsec. In the input language, elements of a sequence are separated by commas: [1, 2, 3] However, instead of a comma, you can also use an EOL: [1, 2 3] Anywhere else, EOL is considered ignorable whitespace. So it's not as simple as just making EOL a token and looking for (comma | eol). I've implemented this functionality in a hand-written parser (basically a hack that keeps track of whether the last read token was preceded by an EOL, without making EOL itself a token). Does anybody have ideas about how to do this with Parsec?

On Fri, Apr 3, 2009 at 8:17 PM, Kannan Goundan
I'm writing a parser with Parsec. In the input language, elements of a sequence are separated by commas:
[1, 2, 3]
However, instead of a comma, you can also use an EOL:
[1, 2 3]
Anywhere else, EOL is considered ignorable whitespace. So it's not as simple as just making EOL a token and looking for (comma | eol).
Untested, but hopefully enough so you get an idea of where to start:
-- End of line parser. Consumes the carriage return, if present. eol :: Parser () eol = eof <|> char '\n'
-- list-element separator. listSep :: Parser () listSep = eol <|> (char ',' >> spaces)
-- list parser. The list may be empty - denoted by "[]" myListOf :: Parser a -> Parser [a] myListOf p = char '[' >> sepBy p listSep >>= \vals -> char ']' >> return vals
This would probably be better off with a custom version of the 'spaces' parser that didn't parse newlines. Antoine

Kannan Goundan wrote:
I've implemented this functionality in a hand-written parser (basically a hack that keeps track of whether the last read token was preceded by an EOL, without making EOL itself a token). Does anybody have ideas about how to do this with Parsec?
You can do exactly the same with Parsec: * create a lexer that yields a [Token], including EOL tokens; * write a function of type [Token] -> [(Token, Bool)] that discards EOLs and tells for each token whether it was preceded by a (now discarded) EOL; * write your pToken :: Token -> Parsec Token function (I omitted some type variables there) that recognises one (Token, Bool)-tuple from the input stream. Or, perhaps easier: * create a lexer that yields a [Token], including EOL tokens; * write a function of type [Token] -> [Token] that discards only those EOL tokens that aren't needed -- for example, those EOL tokens that occur when there are no open ['s, then parse those EOL's explicitly in your parser. Hope this helps, Martijn.

Kannan Goundan wrote:
I'm writing a parser with Parsec. In the input language, elements of a sequence are separated by commas:
[1, 2, 3]
However, instead of a comma, you can also use an EOL:
[1, 2 3]
Anywhere else, EOL is considered ignorable whitespace. So it's not as simple as just making EOL a token and looking for (comma | eol).
Hi Kannan, let's construct the parser top-down. On the top level, you have opening and closing characters, '[' and ']'. Parsec has a function for that: between (char '[') (char ']) And what's in between? A list of elements separated by something. Parsec provides a sepBy function for that: element `sepBy` separator which parses a list of elements separated by separator. What's your separator? Well it's either ',' or a new line and spaces before and after that: mySpaces >> (newline <|> char ',') >> mySpaces -- [1] Let's combine what we've got: myListOf :: (Parsec String () a) -> Parsec String () [a] myListOf elem = between (char '[') (char ']') (elem `sepBy` (mySpaces >> (newline <|> char ',') >> mySpaces)) where mySpaces = many (oneOf (" \t")) And test it in ghci: *Main> parseTest (myListOf anyChar) "[a , b, d ,d\np]" "abddp" Hope this helps! Stephan PS: The important thing is that there are a lot solutions for tricky situations (like yours) in Text.Parsec.Combinator (especially the sepBy and many families). Knowing them can save a lot of work :) [1] I don't use parsec's spaces function because it also accepts newline characters.
I've implemented this functionality in a hand-written parser (basically a hack that keeps track of whether the last read token was preceded by an EOL, without making EOL itself a token). Does anybody have ideas about how to do this with Parsec?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Früher hieß es ja: Ich denke, also bin ich. Heute weiß man: Es geht auch so. - Dieter Nuhr
participants (4)
-
Antoine Latter
-
Kannan Goundan
-
Martijn van Steenbergen
-
Stephan Friedrichs