[Haskell-cafe] New Layout Rule

8 Dec 2006

      Motivated by some recent discussion, I thought I would explore the
possibilty of formalizing the haskell layout rule without the dreaded
parse-error clause, as in, one that can be completly handled by the
lexer.

motivated by that I have written a little program that takes a haskell
file with layout on stdin and spits out one without layout on stdout.

it can be gotten here:
darcs get http://repetae.net/repos/getlaid/

the code is designed to make the layout algorithm completly transparent,
so that we might experiment with it. The function layout in 'Layout.hs'
is the single and complete layout algorithm and the only thing that need
be modified by experimentors.

I have come up with a simple improvement to the algorithm given in the
paper that seems to catch a very large number of layouts. basically,
whenever it comes across something that must come in matched pairs (, ), case
of, if then. it pushes a special context onto the stack, when it comes
across the closing token, it pops every layout context down to the
special context.

there is a special case for "in" that causes it to pop only up to the
last context created with a "let", but not further.

here is the complete algorithm (with my modification, sans the
parse-error rule):
...
data Token = Token String | TokenVLCurly String !Int | TokenNL !Int deriving(Show)
data Context = NoLayout | Layout String !Int
-- the string on 'Layout' and 'TokenVLCurly' is the token that
-- created the layout, always one of "where", "let", "do", or "of"
layout :: [Token] -> [Context] -> [Token]
layout (TokenNL n:rs) (Layout h n':ls)
    | n == n' = semi:layout rs (Layout h n':ls)
    | n > n' = layout rs (Layout h n':ls)
    | n < n' = rbrace:layout (TokenNL n:rs) ls
layout (TokenNL _:rs) ls = layout rs ls
layout (TokenVLCurly h n:rs) (Layout h' n':ls)
    | n >= n' = lbrace:layout rs (Layout h n:Layout h' n':ls)
    | otherwise = error "inner layout can't be shorter than outer one"
layout (TokenVLCurly h n:rs) ls = lbrace:layout rs (Layout h n:ls)
layout (t@(Token s):rs) ls | s `elem` fsts layoutBrackets = t:layout rs (NoLayout:ls)
layout (t@(Token s):rs) ls | s `elem` snds layoutBrackets = case ls of
    Layout _ _:ls -> rbrace:layout (t:rs) ls
    NoLayout:ls -> t:layout rs ls
    [] -> error $ "unexpected " ++ show s
layout (t@(Token "in"):rs) ls = case ls of
    Layout "let" n:ls -> rbrace:t:layout rs ls
    Layout _ _:ls -> rbrace:layout (t:rs) ls
    ls -> t:layout rs ls
layout (t:rs) ls = t:layout rs ls
layout [] (Layout _ n:ls) = rbrace:layout [] ls
layout [] [] = []
layoutBrackets = [ ("case","of"), ("if","then"), 
                   ("(",")"), ("[","]"), ("{","}") ]
now. there are a few cases it doesn't catch. the hanging case at the end
of a guard for instance, I believe this can be solved easily by treating 

'|' and '='  as opening and closing pairs in lets and wheres
'|' and '->' as opening and closing pairs in case bodies.

it is easy to see which one you are in by looking at the context stack.

commas are trickier and are the only other case I think we need to
consider.

I welcome people to experiment and send patches or brainstorm ideas, I
have what I believe is a full solution percolating in my head, but am
unhappy with it, I am going to sleep on it and see if it crystalizes by
morning. In the meantime, perhaps someone can come up with something
more elegant for dealing with the remaining cases. or at least find some
real programs that this code breaks down on!

(bug fixes for the lexer and everything are very much welcome. it will
probably choke on some ghc extensions that would be trivial to add to
the alex grammar)

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈