
Hi all, In a Parsec project I used the *fail* parser, wanting to show a message to the user and halt the parsing process. That's okay, but the error message showed included some other "unexpected" and "expecting" messages that did not seem related to the fail. My guess is that Parsec keeps these messages in an internal state, to use them whenever needed. My question is, how can I clear those error messages and only show the string I pass to fail? -- []'s Giuliano Vilela.

Am Samstag 06 Juni 2009 02:05:13 schrieb Giuliano Vilela:
Hi all,
In a Parsec project I used the *fail* parser, wanting to show a message to the user and halt the parsing process. That's okay, but the error message showed included some other "unexpected" and "expecting" messages that did not seem related to the fail.
I suppose your parser is not do return 'a' fail "No dice" but rather something like do commonPreamble foo <|> bar <|> baz <|> fail message and fail is only called if none of the possibilities succeed? Then each of the failing parsers foo, bar and baz may add messages what input would have allowed them to proceed: --------------------------------- module FailTest where import Text.ParserCombinators.Parsec pa = char 'a' pb = char 'b' pc = char 'c' parser1 = pa <|> pb <|> pc <|> fail "Sorry, no parse" test1 = parse parser1 "test1" "d'oh" ------------------------------------------------------------ *FailTest> test1 Left "test1" (line 1, column 1): unexpected "d" expecting "a", "b" or "c" Sorry, no parse That's probably the kind of output you get. But I'd say the messages are very much related to the fail, most likely it's better to keep them. But if you absolutely want to get rid of them, you need a custom fail that consumes some input to remove the earlier expect messages. To avoid breaking the actual input or falling afoul of end of input, first inject a dummy token into the input, then consume that, and only thereafter fail: ----------------------------------------------------------------- myfail msg = do inp <- getInput setInput ('x':inp) anyToken fail msg parser2 = pa <|> pb <|> pc <|> myfail "sorry, doesn't parse" test2 = parse parser2 "test2" "d'oh" -------------------------------------------------------------------------- *FailTest> test2 Left "test2" (line 1, column 1): sorry, doesn't parse But that probably does more harm than good.
My guess is that Parsec keeps these messages in an internal state, to use them whenever needed. My question is, how can I clear those error messages and only show the string I pass to fail?

On Sat, Jun 6, 2009 at 8:36 AM, Daniel Fischer
Am Samstag 06 Juni 2009 02:05:13 schrieb Giuliano Vilela:
Hi all,
In a Parsec project I used the *fail* parser, wanting to show a message to the user and halt the parsing process. That's okay, but the error message showed included some other "unexpected" and "expecting" messages that did not seem related to the fail.
I suppose your parser is not
do return 'a' fail "No dice"
but rather something like
do commonPreamble foo <|> bar <|> baz <|> fail message
and fail is only called if none of the possibilities succeed?
Close, but not quite. I'm actually using the Parsec monad with my own state to build a symbol table during parsing (for a Pascal sub-language I mentioned earlier in this list). So the fail is deep in the "recursion chain", but it's something like what you mentioned above. It seemed to me that *fail* was the best way to report a error, like "undefined type identifier used".
Then each of the failing parsers foo, bar and baz may add messages what input would have allowed them to proceed: --------------------------------- module FailTest where
import Text.ParserCombinators.Parsec
pa = char 'a'
pb = char 'b'
pc = char 'c'
parser1 = pa <|> pb <|> pc <|> fail "Sorry, no parse"
test1 = parse parser1 "test1" "d'oh"
------------------------------------------------------------
*FailTest> test1 Left "test1" (line 1, column 1): unexpected "d" expecting "a", "b" or "c" Sorry, no parse
That's probably the kind of output you get. But I'd say the messages are very much related to the fail, most likely it's better to keep them.
Nice, I understand now how those messages are built. But, as you can see, the errors I mentioned won't be related to the parsing itself. That's why those "expected" and "unexpected" messages are undesirable.
But if you absolutely want to get rid of them, you need a custom fail that consumes some input to remove the earlier expect messages. To avoid breaking the actual input or falling afoul of end of input, first inject a dummy token into the input, then consume that, and only thereafter fail: -----------------------------------------------------------------
myfail msg = do inp <- getInput setInput ('x':inp) anyToken fail msg
parser2 = pa <|> pb <|> pc <|> myfail "sorry, doesn't parse"
test2 = parse parser2 "test2" "d'oh" --------------------------------------------------------------------------
*FailTest> test2 Left "test2" (line 1, column 1): sorry, doesn't parse
That worked :) But thinking about it now, I see my solution probably isn't optimal. In some cases, I will report type errors even when there is bad syntax further in the source, which is not common behavior. You got any suggestions for my use case? The source code for the interpreter is here: http://code.google.com/p/hpascal/ (pretty immature, my group just started writing it) if you want to take a look. Important files are: Parsing.hs (the parser itself) and TypeChecker.hs (parsers that access the internal monad state and build the table). -- []'s Giuliano Vilela.

Am Samstag 06 Juni 2009 14:26:14 schrieb Giuliano Vilela:
On Sat, Jun 6, 2009 at 8:36 AM, Daniel Fischer
wrote: Am Samstag 06 Juni 2009 02:05:13 schrieb Giuliano Vilela:
Close, but not quite. I'm actually using the Parsec monad with my own state to build a symbol table during parsing (for a Pascal sub-language I mentioned earlier in this list). So the fail is deep in the "recursion chain", but it's something like what you mentioned above. It seemed to me that *fail* was the best way to report a error, like "undefined type identifier used".
Ah, I see. You successfully parsed a list of syntactically correct declarations, then check if they're semantically correct (using only known types, in this case) and if that fails, you really don't want to know under which circumstances you could have parsed more declarations :)
But thinking about it now, I see my solution probably isn't optimal. In some cases, I will report type errors even when there is bad syntax further in the source, which is not common behavior. You got any suggestions for my use case?
If you don't want a "die on first error" strategy, add a "list of encountered errors" component to your user state and log all encountered errors. data ParsingState = PS { symTbl :: SymbolTable , typeTbl :: TypeTable , errs :: [SyntaxError] } data SyntaxError = UnknownType SourcePos String | UnknownOperator SourcePos String | MissingSemicolon SourcePos | ... parseType = do typeId <- Tk.identifier tTable <- typeT case mlookup typeId tTable of Nothing -> return (BadType typeId) Just typeV -> return (GoodType typeId typeV) varDeclaration = do varIdL <- Tk.commaSep1 Tk.identifier Tk.symbol ":" pos <- getPosition tp <- parseType case tp of BadType typeName -> do logError (UnknownType pos typeName) return (VarDecl varIdL typeName) GoodType typeName typeV -> do forM_ varIdL (\vi -> insert vi into symbol table) return (VarDecl varIdL typeName) When inserting variables into the symbol table, you should check whether it's already there, you don't want to parse var x, y : integer; z, x : boolean; without error. I don't know what you will finally want to parse, but you should think about local variables with the same name as one in an enclosing scope early. Finally, realProgram = do prog <- program errs <- errorList if null errs then return prog else failWithErrors errs
The source code for the interpreter is here: http://code.google.com/p/hpascal/ (pretty immature, my group just started writing it) if you want to take a look. Important files are: Parsing.hs (the parser itself) and TypeChecker.hs (parsers that access the internal monad state and build the table).
Hope that helps, Daniel
participants (2)
-
Daniel Fischer
-
Giuliano Vilela