RE: Question about Haskell lexer and parser.

Question is really about layout rules. If the first lexeme of a module is not a "module" keyword, we insert {n}, where n is the indentation of first lexeme. Then we apply function L to the list of lexemes and stack of layouts:
L ({n}:lexemes) []
One of first case definitions of L covers this situation: L ({n}:ts) [] | n>0 = {:L ts [n] -- n>0 means that layout isn't explicit. It's obvious that after our first {n} comes usual lexeme, like Var. This situaiton is covered in the following L rule:
L (t:ts) (m:ms) | m /= 0 = }:L (t:ts) ms -- Haskell Report says that we here -- should recognize parsing errors -- but how we can do that in lexer?
This rule only applies if there is a parse error on the next token (t). The report explains what this means in "Note 5". It does mean that layout processing can't be done entirely in the lexer, the parser has to be involved too. This is what makes parsing layout in Haskell hard. Take a look at some of the existing Haskell parsers to see how it can be done: there are two in GHC's source tree (one in GHC itself, another in libraries/haskell-src). Hugs's parser is written in C but uses similar techniques. nhc98's parser uses parsing combinators IIRC. Cheers, Simon

Hi,
This rule only applies if there is a parse error on the next token (t). [...] It does mean that layout processing can't be done entirely in the lexer,
Take a look at some of the existing Haskell parsers to see how it can be done: [...]
Don't look at Helium. We have a simplified layout rule which can be resolved in the lexer entirely. You *have to* indent less to close a layout context. Of course, it doesn't accept complicated usage of the layout rule (e.g. [ case 3 of 3 -> 4, 5]) but the parse errors can be clearer and this kind of usage is very rare, especially if you don't tell students about the funny rule above. Greetings, Arjan
participants (2)
-
Arjan van IJzendoorn
-
Simon Marlow