On Sat, Jun 6, 2009 at 8:36 AM, Daniel Fischer <daniel.is.fischer@web.de> wrote:

Am Samstag 06 Juni 2009 02:05:13 schrieb Giuliano Vilela:

> Hi all,
>
> In a Parsec project I used the *fail* parser, wanting to show a message to
> the user and halt the parsing process. That's okay, but the error message
> showed included some other "unexpected" and "expecting" messages that did
> not seem related to the fail.

I suppose your parser is not

do return 'a'
fail "No dice"

but rather something like

do commonPreamble
foo <|> bar <|> baz <|> fail message

and fail is only called if none of the possibilities succeed?

Close, but not quite. I'm actually using the Parsec monad with my own state to build a symbol table during parsing (for a Pascal sub-language I mentioned earlier in this list). So the fail is deep in the "recursion chain", but it's something like what you mentioned above. It seemed to me that fail was the best way to report a error, like "undefined type identifier used".

Then each of the failing parsers foo, bar and baz may add messages what input would have
allowed them to proceed:
---------------------------------
module FailTest where

import Text.ParserCombinators.Parsec

pa = char 'a'

pb = char 'b'

pc = char 'c'

parser1 = pa <|> pb <|> pc <|> fail "Sorry, no parse"

test1 = parse parser1 "test1" "d'oh"

------------------------------------------------------------

*FailTest> test1
Left "test1" (line 1, column 1):
unexpected "d"
expecting "a", "b" or "c"
Sorry, no parse

That's probably the kind of output you get. But I'd say the messages are very much related
to the fail, most likely it's better to keep them.

Nice, I understand now how those messages are built. But, as you can see, the errors I mentioned won't be related to the parsing itself. That's why those "expected" and "unexpected" messages are undesirable.

But if you absolutely want to get rid of them, you need a custom fail that consumes some
input to remove the earlier expect messages. To avoid breaking the actual input or falling
afoul of end of input, first inject a dummy token into the input, then consume that, and
only thereafter fail:
-----------------------------------------------------------------

myfail msg = do
inp <- getInput
setInput ('x':inp)
anyToken
fail msg

parser2 = pa <|> pb <|> pc <|> myfail "sorry, doesn't parse"

test2 = parse parser2 "test2" "d'oh"
--------------------------------------------------------------------------

*FailTest> test2
Left "test2" (line 1, column 1):
sorry, doesn't parse

That worked :)

But thinking about it now, I see my solution probably isn't optimal. In some cases, I will report type errors even when there is bad syntax further in the source, which is not common behavior. You got any suggestions for my use case?

The source code for the interpreter is here: http://code.google.com/p/hpascal/ (pretty immature, my group just started writing it) if you want to take a look. Important files are: Parsing.hs (the parser itself) and TypeChecker.hs (parsers that access the internal monad state and build the table).