
Motivated by some recent discussion, I thought I would explore the possibilty of formalizing the haskell layout rule without the dreaded parse-error clause, as in, one that can be completly handled by the lexer. motivated by that I have written a little program that takes a haskell file with layout on stdin and spits out one without layout on stdout. it can be gotten here: darcs get http://repetae.net/repos/getlaid/ the code is designed to make the layout algorithm completly transparent, so that we might experiment with it. The function layout in 'Layout.hs' is the single and complete layout algorithm and the only thing that need be modified by experimentors. I have come up with a simple improvement to the algorithm given in the paper that seems to catch a very large number of layouts. basically, whenever it comes across something that must come in matched pairs (, ), case of, if then. it pushes a special context onto the stack, when it comes across the closing token, it pops every layout context down to the special context. there is a special case for "in" that causes it to pop only up to the last context created with a "let", but not further. here is the complete algorithm (with my modification, sans the parse-error rule):
data Token = Token String | TokenVLCurly String !Int | TokenNL !Int deriving(Show)
data Context = NoLayout | Layout String !Int
-- the string on 'Layout' and 'TokenVLCurly' is the token that -- created the layout, always one of "where", "let", "do", or "of"
layout :: [Token] -> [Context] -> [Token] layout (TokenNL n:rs) (Layout h n':ls) | n == n' = semi:layout rs (Layout h n':ls) | n > n' = layout rs (Layout h n':ls) | n < n' = rbrace:layout (TokenNL n:rs) ls layout (TokenNL _:rs) ls = layout rs ls layout (TokenVLCurly h n:rs) (Layout h' n':ls) | n >= n' = lbrace:layout rs (Layout h n:Layout h' n':ls) | otherwise = error "inner layout can't be shorter than outer one" layout (TokenVLCurly h n:rs) ls = lbrace:layout rs (Layout h n:ls) layout (t@(Token s):rs) ls | s `elem` fsts layoutBrackets = t:layout rs (NoLayout:ls) layout (t@(Token s):rs) ls | s `elem` snds layoutBrackets = case ls of Layout _ _:ls -> rbrace:layout (t:rs) ls NoLayout:ls -> t:layout rs ls [] -> error $ "unexpected " ++ show s layout (t@(Token "in"):rs) ls = case ls of Layout "let" n:ls -> rbrace:t:layout rs ls Layout _ _:ls -> rbrace:layout (t:rs) ls ls -> t:layout rs ls layout (t:rs) ls = t:layout rs ls layout [] (Layout _ n:ls) = rbrace:layout [] ls layout [] [] = []
layoutBrackets = [ ("case","of"), ("if","then"), ("(",")"), ("[","]"), ("{","}") ]
now. there are a few cases it doesn't catch. the hanging case at the end of a guard for instance, I believe this can be solved easily by treating '|' and '=' as opening and closing pairs in lets and wheres '|' and '->' as opening and closing pairs in case bodies. it is easy to see which one you are in by looking at the context stack. commas are trickier and are the only other case I think we need to consider. I welcome people to experiment and send patches or brainstorm ideas, I have what I believe is a full solution percolating in my head, but am unhappy with it, I am going to sleep on it and see if it crystalizes by morning. In the meantime, perhaps someone can come up with something more elegant for dealing with the remaining cases. or at least find some real programs that this code breaks down on! (bug fixes for the lexer and everything are very much welcome. it will probably choke on some ghc extensions that would be trivial to add to the alex grammar) John -- John Meacham - ⑆repetae.net⑆john⑈

On Fri, Dec 08, 2006 at 02:33:47AM -0800, John Meacham wrote:
Motivated by some recent discussion, I thought I would explore the possibilty of formalizing the haskell layout rule without the dreaded parse-error clause, as in, one that can be completly handled by the lexer.
There was some discussion about that a while ago on this list, e.g. http://www.haskell.org/pipermail/haskell-prime/2006-March/000915.html and other subthreads in that thread. I'd still love to see a replacement which can be a separate phase between lexing and parsing, even if it means we need to lay some things out differently or tweak other bits of the syntax. Thanks Ian

On Fri, Dec 08, 2006 at 03:26:30PM +0000, Ian Lynagh wrote:
On Fri, Dec 08, 2006 at 02:33:47AM -0800, John Meacham wrote:
Motivated by some recent discussion, I thought I would explore the possibilty of formalizing the haskell layout rule without the dreaded parse-error clause, as in, one that can be completly handled by the lexer.
There was some discussion about that a while ago on this list, e.g. http://www.haskell.org/pipermail/haskell-prime/2006-March/000915.html and other subthreads in that thread.
I'd still love to see a replacement which can be a separate phase between lexing and parsing, even if it means we need to lay some things out differently or tweak other bits of the syntax.
"let" isn't an issue (at least not for the reason specified in that mail). It is taken care of properly in the version I posted. the trick is to annotate each layout context with what caused it to occur. when you reach an "in" rather than popping up to the most recent NoLayout (as you would with a bracket) you pop up to the most recent layout context that was started with a "let". (if such a context doesn't exist, it is a syntax error) John -- John Meacham - ⑆repetae.net⑆john⑈

I have made some improvements to the algorithm, and I am happy to say that with some minor tweaks, it correctly lays out the programs in the nofib suite. the algorithm is not much more complicated than the current one in the report, but doesn't have the parse-error rule. it does require a single token of lookahead to look for an "in". darcs get http://repetae.net/repos/getlaid/ I have also added a mode so it can work as a ghc preprocesor, allowing very easy testing. just compile with. ghc -pgmF /path/to/getlaid -F --make Main.hs and it will automatically process all your files. Now, it isn't perfect. I can construct pathological examples that the old rule would parse, but this one won't. however, if those examples don't actually occur in practice, then that is not so much an issue. my program doesn't handle many non-haskell 98 extensions, but can probably be easily modified to do so. John -- John Meacham - ⑆repetae.net⑆john⑈

haskell-prime-bounces@haskell.org wrote:
I have made some improvements to the algorithm, and I am happy to say that with some minor tweaks, it correctly lays out the programs in the nofib suite.
the algorithm is not much more complicated than the current one in the report, but doesn't have the parse-error rule. it does require a single token of lookahead to look for an "in".
darcs get http://repetae.net/repos/getlaid/
I have also added a mode so it can work as a ghc preprocesor, allowing very easy testing. just compile with.
ghc -pgmF /path/to/getlaid -F --make Main.hs
and it will automatically process all your files.
Nice! I ran the GHC parser tests using your preprocessor, and get 9 failures out of 27 in the should_compile class. Some of these are bogus (problems with the lexer you're using rather than the layout preprocessor). The should_fail class all failed, but that's because column numbers are different in the preprocessed result, so the error messages changed, I'll need to look at these individually. I've attached a patch that corrects a couple of the failures in the should_compile class. Cheers, Simon
participants (3)
-
Ian Lynagh
-
John Meacham
-
Simon Marlow