
Hey guys, I was wondering if there were possiblities to ignore certain errors during parsing. I tried using the error token, but that didn't seem to work. I looked at the following topic http://haskell.1045720.n5.nabble.com/Error-detection-in-GLR-Happy-grammar-td... For my parser I need to be able to parse a certain token at any place, but only in some places it is part of a rule. In other cases this token should be ignored after which the parser just procedes to the next token. Is this possible with Happy? Should I use the monadic parser to achieve this? More concretely, I have tokenized some sourcecode which contains comments, these should be ignored except when they are located above a function, in which case I want them to parse. Any ideas about this issue are greatly appreciated! Thanks, Tom

I'd join comments in with tokens so each token has a comment - possibly the empty string, then the parser can decide what to do with the comment part of token - e.g retaining it for functions, ignoring it for everything else. You may have to write a two-pass lexer to do this.

By the way - my last answer was for the second part of the question.
For error handling - Doaitse Swierstra's UU-Parsing is a combinator
parsing library with some flexibility for error recovery. For an LR
parser like Happy adding extra error handling cases to the productions
in your grammar is the main tool you have.
On 8 March 2011 15:54, Stephen Tetley
I'd join comments in with tokens so each token has a comment - possibly the empty string, then the parser can decide what to do with the comment part of token - e.g retaining it for functions, ignoring it for everything else.
You may have to write a two-pass lexer to do this.

That two-pass lexer sounds like a good idea. I actually want to keep the happy parser if possible, can you elaborate on adding extra error handling cases for production rules? Do you mean I have to add a line for comments on possible places where they can occur? Thanks

Hi Tom Here's how I'd do comment annotation in the Parser:
type Comment = String type Identifier = String
I suspect data carrying tokens need to pair the data and the comment so Happy can treat them as a positional reference e.g. $1
data Token = TK_identifier (Identifier,Comment) | TK_kywd_module Comment | ...
Productions now have to use smart constructors:
module :: { Module } : '%module' mname defs { mkModule $1 $2 $3 }
data Module = Module { mod_comment :: Comment , mod_name :: String , mod_body :: [Def] }
The 'smart' constructor takes the comment before the module delacration, any comment between the module start token and the module name is ignored...
mkModule :: Comment -> (String,Comment) -> [Def] -> Module mkModule outer_comment (mname, _) defs = Module outer_comment mname defs
As for error handling, the strategy is to add error handling productions after "good" productions. Now module can "handle" a missing module name:
module :: { Module } : '%module' mname defs { mkModule $1 $2 $3 } : ''%module' defs { badModule $1 $2 }
badModule :: Comment [Def] -> Module badModule outer_comment defs = Module outer_comment fake_name defs where fake_name = "ERR - parser error reading module name"
Ideally the smart constructors should be in a monad that supports error logging like Writer. As you can see this isn't a great way of doing things but I'm not sure you have any other options. Personally I'd see if I could live with "first fail" instead. Best wishes Stephen

Alright thanks for your comprehensive answer! I think I got something to
work with :)
Cheers,
Tom
On Tue, Mar 8, 2011 at 8:09 PM, Stephen Tetley
Hi Tom
Here's how I'd do comment annotation in the Parser:
type Comment = String type Identifier = String
I suspect data carrying tokens need to pair the data and the comment so Happy can treat them as a positional reference e.g. $1
data Token = TK_identifier (Identifier,Comment) | TK_kywd_module Comment | ...
Productions now have to use smart constructors:
module :: { Module } : '%module' mname defs { mkModule $1 $2 $3 }
data Module = Module { mod_comment :: Comment , mod_name :: String , mod_body :: [Def] }
The 'smart' constructor takes the comment before the module delacration, any comment between the module start token and the module name is ignored...
mkModule :: Comment -> (String,Comment) -> [Def] -> Module mkModule outer_comment (mname, _) defs = Module outer_comment mname defs
As for error handling, the strategy is to add error handling productions after "good" productions.
Now module can "handle" a missing module name:
module :: { Module } : '%module' mname defs { mkModule $1 $2 $3 } : ''%module' defs { badModule $1 $2 }
badModule :: Comment [Def] -> Module badModule outer_comment defs = Module outer_comment fake_name defs where fake_name = "ERR - parser error reading module name"
Ideally the smart constructors should be in a monad that supports error logging like Writer.
As you can see this isn't a great way of doing things but I'm not sure you have any other options. Personally I'd see if I could live with "first fail" instead.
Best wishes
Stephen
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (2)
-
Stephen Tetley
-
Tom