Simple lookahead with Parsec

Hello. I'm just starting out with Parsec and Haskell, and I want to parse a very simple grammar with it. I think I may not understand how to use try correctly. Here's a simplified version of my grammar: myType = try (primitiveType) <|> arrayType primitiveType = do {reserved "int"; return "primitive"} arrayType = do { primitiveType; symbol "["; symbol "]"; return "array"} Basically, I want to use lookahead to produce the result "array" on inputs like "int[]" and the result "primitive" on inputs like "int". However, no matter what I do with the try function, I am not able to get what I expect. When I run this code using the runLex function described in the Parsec documentation, it chokes on "int[]", saying that the '[' character was unexpected. I've discovered that switching the order of primitiveType and arrayType solves this, which makes sense, but I still expected it to be possible to use try() in this manner to resolve the error with lookahead. Am I doing something wrong? (I've got some boilerplate code form the Parsec documentation defined as well, but I didn't include it for brevity's sake. Reserved and symbol are defined how you'd expect.)

On Mar 19, 2010, at 11:14 , Derek Thurn wrote:
Here's a simplified version of my grammar:
myType = try (primitiveType) <|> arrayType primitiveType = do {reserved "int"; return "primitive"} arrayType = do { primitiveType; symbol "["; symbol "]"; return "array"}
What you're doing here is matching and succeeding on primitiveType and never even looking for the bracket; your parser then exits and the framework fails expecting the end of the token stream but finding the unmatched left bracket. You need to rearrange your cases:
myType = try arrayType <|> primitiveType
so that your parser actually checks for the brackets before concluding that it has a primitiveType already. Even better would be to refactor the grammar since both types start with a primitiveType, but I'll leave that up to you since it does complicate returning the shape of the resulting type. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Fri, Mar 19, 2010 at 11:21 AM, Brandon S. Allbery KF8NH
On Mar 19, 2010, at 11:14 , Derek Thurn wrote:
Here's a simplified version of my grammar:
myType = try (primitiveType) <|> arrayType primitiveType = do {reserved "int"; return "primitive"} arrayType = do { primitiveType; symbol "["; symbol "]"; return "array"}
What you're doing here is matching and succeeding on primitiveType and never even looking for the bracket; your parser then exits and the framework fails expecting the end of the token stream but finding the unmatched left bracket.
You need to rearrange your cases:
myType = try arrayType <|> primitiveType
so that your parser actually checks for the brackets before concluding that it has a primitiveType already.
Even better would be to refactor the grammar since both types start with a primitiveType, but I'll leave that up to you since it does complicate returning the shape of the resulting type.
-- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
Ah, I see. The problem is that the primitiveType parser succeeds. Is there then no generic way to use lookahead in Parsec to make choices of this nature? I appreciate that it might be possible to refactor my grammar to never require examining the next token to make a parsing decision about the current token, but I've already got an LALR(1) grammar that I'm pretty happy with... What I'd ideally like is some sort of general solution wherein Parsec first tries primitiveType, moves to the next token, realizes that it will be a parse error, and backtracks to try the other alternative.

On 03/19/10 14:24, Derek Thurn wrote:
Ah, I see. The problem is that the primitiveType parser succeeds. Is there then no generic way to use lookahead in Parsec to make choices of this nature? I appreciate that it might be possible to refactor my grammar to never require examining the next token to make a parsing decision about the current token, but I've already got an LALR(1) grammar that I'm pretty happy with...
Well, you could learn to make something in Parsec that matches your grammar, or you could use a LALR(1) parsing library. Like "Happy". (Well, Happy is actually a sort of preprocessor for Haskell. But still, doesn't using the most-fitting tool for the job seem to make sense?) -Isaac
participants (3)
-
Brandon S. Allbery KF8NH
-
Derek Thurn
-
Isaac Dupree