
Thank you all for the responses. Here's an example: As I alrerady said, I tried to parse the MMIXAL assembly language. Each instruction has up to three operands, looking like this: @+4 (Jump for bytes forward) "foo" (the string foo" '0'>>(1+2) etc. A string literal may contain anything but a newline, (there are no escape codes or similar). But when I add a check for a newline, the parser just fails and the next one is tried. This is undesired, as I want to return an error like "unexpected newline" instead. How is this handled in other parsers?

Hi, Robert Clausecker wrote:
Each instruction has up to three operands, looking like this:
@+4 (Jump for bytes forward) "foo" (the string foo" '0'>>(1+2)
etc. A string literal may contain anything but a newline, (there are no escape codes or similar). But when I add a check for a newline, the parser just fails and the next one is tried. This is undesired, as I want to return an error like "unexpected newline" instead. How is this handled in other parsers?
I would expect that the other parsers are tried, but fail, because they do not accept an initial quotation mark. You get two errors messages then: 1. Unexpected newline after quotation mark 2. Unexpected quotation mark These two error messages reflect the two ways to solve the problem: Either delete the first quotation mark, or add a second one. Tillmann PS. Please use "Reply" to answer posts, so that they can be put into the same thread.

On Wed, 2 Mar 2011 14:14:02 +0100, you wrote:
Thank you all for the responses. Here's an example:
As I alrerady said, I tried to parse the MMIXAL assembly language. Each instruction has up to three operands, looking like this:
@+4 (Jump for bytes forward) "foo" (the string foo" '0'>>(1+2)
etc. A string literal may contain anything but a newline, (there are no escape codes or similar). But when I add a check for a newline, the parser just fails and the next one is tried. This is undesired, as I want to return an error like "unexpected newline" instead. How is this handled in other parsers?
Tillman's reply is absolutely correct. If a particular sequence of characters is invalid according to your grammar, then _all_ of the alternatives in scope at that point should fail to parse that sequence. If that's not happening, then there's something wrong with the way you've expressed your grammar. I don't know how much experience you have with language grammars, but it might be helpful to try to write down MMIXAL's grammar using EBNF notation, as a starting point. -Steve Schafer

Apologies if this has been answered already (I've got a bit lost with this thread), but the *try* here seems to be giving you precisely the behaviour you don't want. *try* means backtrack on failure, and try the next parser. So if you want ill formed strings to throw an error if they aren't properly enclosed in double quotes don't use try. <|> try $ (char '"' *> (StringLit . B.pack <$> manyTill (notChar '\n') (char '"')))

Actually this is stranger than I thought - from testing it seems like
Attoparsec's (<|>) is different to Parsec's. From what I'm seeing
Attoparsec appears to do a full back track for (<|>) regardless of
whether the string lexer is wrapped in try, whereas Parsec needs try
to backtrack.
On 2 March 2011 16:24, Stephen Tetley
*try* means backtrack on failure, and try the next parser. So if you want ill formed strings to throw an error if they aren't properly enclosed in double quotes don't use try.

Actually, It's not <|> that's different, it's the string combinator.
In Parsec, string matches each character one at a time. If the match
fails, any partial input it matched is consumed. In attoparsec,
string matches either the entire thing or not, as a single step. If
it fails to match, no input is consumed.
Carl
On Wed, Mar 2, 2011 at 9:51 AM, Stephen Tetley
Actually this is stranger than I thought - from testing it seems like Attoparsec's (<|>) is different to Parsec's. From what I'm seeing Attoparsec appears to do a full back track for (<|>) regardless of whether the string lexer is wrapped in try, whereas Parsec needs try to backtrack.
On 2 March 2011 16:24, Stephen Tetley
wrote: *try* means backtrack on failure, and try the next parser. So if you want ill formed strings to throw an error if they aren't properly enclosed in double quotes don't use try.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (5)
-
Carl Howells
-
Robert Clausecker
-
Stephen Tetley
-
Steve Schafer
-
Tillmann Rendel