
Motivated by some recent discussion, I thought I would explore the possibilty of formalizing the haskell layout rule without the dreaded parse-error clause, as in, one that can be completly handled by the lexer. motivated by that I have written a little program that takes a haskell file with layout on stdin and spits out one without layout on stdout. it can be gotten here: darcs get http://repetae.net/repos/getlaid/ the code is designed to make the layout algorithm completly transparent, so that we might experiment with it. The function layout in 'Layout.hs' is the single and complete layout algorithm and the only thing that need be modified by experimentors. I have come up with a simple improvement to the algorithm given in the paper that seems to catch a very large number of layouts. basically, whenever it comes across something that must come in matched pairs (, ), case of, if then. it pushes a special context onto the stack, when it comes across the closing token, it pops every layout context down to the special context. there is a special case for "in" that causes it to pop only up to the last context created with a "let", but not further. here is the complete algorithm (with my modification, sans the parse-error rule):
data Token = Token String | TokenVLCurly String !Int | TokenNL !Int deriving(Show)
data Context = NoLayout | Layout String !Int
-- the string on 'Layout' and 'TokenVLCurly' is the token that -- created the layout, always one of "where", "let", "do", or "of"
layout :: [Token] -> [Context] -> [Token] layout (TokenNL n:rs) (Layout h n':ls) | n == n' = semi:layout rs (Layout h n':ls) | n > n' = layout rs (Layout h n':ls) | n < n' = rbrace:layout (TokenNL n:rs) ls layout (TokenNL _:rs) ls = layout rs ls layout (TokenVLCurly h n:rs) (Layout h' n':ls) | n >= n' = lbrace:layout rs (Layout h n:Layout h' n':ls) | otherwise = error "inner layout can't be shorter than outer one" layout (TokenVLCurly h n:rs) ls = lbrace:layout rs (Layout h n:ls) layout (t@(Token s):rs) ls | s `elem` fsts layoutBrackets = t:layout rs (NoLayout:ls) layout (t@(Token s):rs) ls | s `elem` snds layoutBrackets = case ls of Layout _ _:ls -> rbrace:layout (t:rs) ls NoLayout:ls -> t:layout rs ls [] -> error $ "unexpected " ++ show s layout (t@(Token "in"):rs) ls = case ls of Layout "let" n:ls -> rbrace:t:layout rs ls Layout _ _:ls -> rbrace:layout (t:rs) ls ls -> t:layout rs ls layout (t:rs) ls = t:layout rs ls layout [] (Layout _ n:ls) = rbrace:layout [] ls layout [] [] = []
layoutBrackets = [ ("case","of"), ("if","then"), ("(",")"), ("[","]"), ("{","}") ]
now. there are a few cases it doesn't catch. the hanging case at the end of a guard for instance, I believe this can be solved easily by treating '|' and '=' as opening and closing pairs in lets and wheres '|' and '->' as opening and closing pairs in case bodies. it is easy to see which one you are in by looking at the context stack. commas are trickier and are the only other case I think we need to consider. I welcome people to experiment and send patches or brainstorm ideas, I have what I believe is a full solution percolating in my head, but am unhappy with it, I am going to sleep on it and see if it crystalizes by morning. In the meantime, perhaps someone can come up with something more elegant for dealing with the remaining cases. or at least find some real programs that this code breaks down on! (bug fixes for the lexer and everything are very much welcome. it will probably choke on some ghc extensions that would be trivial to add to the alex grammar) John -- John Meacham - ⑆repetae.net⑆john⑈