AlternateLayoutRule

Hi, I noticed that ghc now supports an 'AlternateLayoutRule' but am having trouble finding information about it. Is it based on my proposal and sample implementation? http://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html https://ghc.haskell.org/trac/haskell-prime/wiki/AlternativeLayoutRule implies it has been in use since 6.13. If that is the case, I assume it has been found stable? I ask because I was going to rewrite the jhc lexer and would like to use the new mechanism in a way that is compatible with ghc. If it is already using my code, so much the better. John -- John Meacham - http://notanumber.net/

On 13/05/14 15:04, John Meacham wrote:
Hi, I noticed that ghc now supports an 'AlternateLayoutRule' but am having trouble finding information about it. Is it based on my proposal and sample implementation? http://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html
Yes it is, but I think we had to flesh it out with a few more cases. Ian will know more, he implemented it in GHC. It has never been the default implementation, because it wasn't possible to cover 100% of the strange ways that code in the wild currently relies on the parse-error behaviour in the layout rule. You can get it with -XAlternateLayoutRule though. I'm not sure what we should do about it. I think Ian's motivation was to experiment with a view to proposing it as a replacement for the layout rule in Haskell', but (and this is my opinion) I think it ends up not being as clean as we might have hoped, and the cases where it doesn't work in the same way as the old rule aren't easily explainable to people. On the other hand, we did find a nice use for it in GHC: the multiline parser in GHCi can tell whether you've finished typing a complete expression using the alternate layout rule. Cheers, Simon
https://ghc.haskell.org/trac/haskell-prime/wiki/AlternativeLayoutRule implies it has been in use since 6.13. If that is the case, I assume it has been found stable?
I ask because I was going to rewrite the jhc lexer and would like to use the new mechanism in a way that is compatible with ghc. If it is already using my code, so much the better.

On Tue, May 13, 2014 at 09:32:31PM +0100, Simon Marlow wrote:
On 13/05/14 15:04, John Meacham wrote:
Hi, I noticed that ghc now supports an 'AlternateLayoutRule' but am having trouble finding information about it. Is it based on my proposal and sample implementation? http://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html
Yes it is, but I think we had to flesh it out with a few more cases. Ian will know more, he implemented it in GHC.
I'm not sure what we should do about it. I think Ian's motivation was to experiment with a view to proposing it as a replacement for the layout rule in Haskell', but (and this is my opinion) I think it ends up not being as clean as we might have hoped, and the cases where it doesn't work in the same way as the old rule aren't easily explainable to people.
I ask because I was going to rewrite the jhc lexer and would like to use the new mechanism in a way that is compatible with ghc. If it is already using my code, so much the better.
It's based on your code, but I had to essentially completely re-write it to work with the way that GHC's parser works; I don't think sharing the code will be feasible. I also fixed some bugs (e.g. I think that with your code, foo = let { x = x } in x turns into something like foo = let { x = x } } in x ) and made some tweaks after trying it on GHC+bootlibs, but I don't have details to hand. However, the consensus was that the new rule has too many cases, due to trying to match the old rule as closely as possible, and despite that it doesn't have the advantage that it is a drop-in replacement. ISTR Cabal in particular needed several changes to compile with it (0aba7b9f2e5d8acea156d575184a4a63af0a1ed3). Most of them were code of the form case e of p -> e' where bs needing the 'where' clause to be less indented. The plan, yet to be implemented, was to remove some of the cases of the new rule, making it easier to understand, specify and implement, at the expense of breaking more code when the switch is flipped. Ideally, there would be a period during which compilers would check during compilation whether the new rule would give a different token sequence to the old rule, and warn if so. Thanks Ian

On Tue, May 13, 2014 at 2:22 PM, Ian Lynagh
It's based on your code, but I had to essentially completely re-write it to work with the way that GHC's parser works; I don't think sharing the code will be feasible.
Ah, yeah, I didn't think the code would translate directly, but I'd want the logic to match of course. But on that subject of sharing code, I just relicensed jhc under a permissive license like ghc with the last release so code can move both ways. I mainly think it will be useful in the libraryies and support routines rather than compiler core stuff. though, I may nab the c-- parser/emitter. I already use it as an intermediate language but with no external representation. My lexer/parser is based on a branch of the thih source which also eventually begat haskell-src but it is fairly hairy. Written before monad syntax so has hand written CPS everywhere. I am hoping that if the new layout rule works I can move to a packrat or hybrid recursive descent parser, letting LALR handle the gross structure but use hand written recursive descent with excellent error messages for expressions. The only reason I have stuck it out with the happy one for so long is it has the magic lexer/layout interaction baked in.
I also fixed some bugs (e.g. I think that with your code,   foo = let { x = x }      in x turns into something like   foo = let { x = x }      } in x ) and made some tweaks after trying it on GHC+bootlibs, but I don't have details to hand.
ah cool, can you point me to which file it is implemented in in the source so I can copy your new rules?
However, the consensus was that the new rule has too many cases, due to trying to match the old rule as closely as possible, and despite that it doesn't have the advantage that it is a drop-in replacement. ISTR Cabal in particular needed several changes to compile with it (0aba7b9f2e5d8acea156d575184a4a63af0a1ed3). Most of them were code of the form   case e of   p -> e'   where bs needing the 'where' clause to be less indented.
Hmm.. interesting. I only tested against the whole nofib suite originally. I need to expand that.
The plan, yet to be implemented, was to remove some of the cases of the new rule, making it easier to understand, specify and implement, at the expense of breaking more code when the switch is flipped.
I don't mind that if it makes sense. Would only work if ghc did it though as no one is going to relayout their code for jhc (unless they are one of the ones using it to compile to ARM boards with 32k of RAM :) ). Maybe I can twiddle the algorithm in ghc itself for a bit first to take advantage of the bigger accessible codebase to test on.
Ideally, there would be a period during which compilers would check during compilation whether the new rule would give a different token sequence to the old rule, and warn if so.
Yeah, it would be tricky in jhc to support both front ends, though... maybe I can revert my current lexer parser back to simpler haskell 98 syntax and require anything that uses extensions to use the new layout rule. Thanks, John -- John Meacham - http://notanumber.net/

On Tue, May 13, 2014 at 03:11:16PM -0700, John Meacham wrote:
ah cool, can you point me to which file it is implemented in in the source so I can copy your new rules?
It's lexTokenAlr and friends in compiler/parser/Lexer.x It's a while since I looked at it, but IIRC it's not as clean to read as your specification code was, as GHC needs a function that it can call that will just give it the next token. That means that we need to keep some state for what we've recently seen, and if we want to return multiple tokens then we can only return the first one, and need to store the rest to be returned by subsequent calls Thanks Ian

Okay, I believe I have come up with a modified version that accepts many more programs and doesn't require complicated comma handling, you can make all decisions based on the top of the context stack. It also allows many useful layouts that were illegal under the old system. The main change was to store the expected closing token in addition to the opening one on the context stack, then look up the appropriate token in a table based on the immediately enclosing context. Unlike the old version that recursed up the stack, this only needs to look at the top. The rules are (where _ means true everywhere) always enabled: (case,of) (if,then) (then,else) ('(',')') ('[',']') ('{','}') conditional: of => ('|','->') let => ('|','=') '[' => ('|',']') then we follow the same rules as before, except we check the token against the stored ending token at the top of the stack and 'let' is only terminated by ',' if '|' is at the top of the context stack. In addition I added a list of 'layout insensitive' symbols, that will never poke the layout rule, they are ["|","->","=",";",","]. lines starting with these will not continue or close layout contexts. So, there are still a couple obscure cases this won't lay out properly, but it does allow some handy things that were not allowed before. it boils down to the fact that layout is turned off inside any enclosing brackets and effects do not propagate outside of them. So nothing done between [ .. ] can affect anything outside of them. some handy things this allows -- a 'cond' construct that lines up nicely. foo = case () of () | cond1 -> exp1 | cond2 -> exp2 | cond3 -> exp3 line up your tuple defs (alpha ,beta ,gamma) = foo no need to double indent conditionals in do let let f x y z | x == z = y | x == y = z | y == z = x The simple rule is, inside enclosing brackets, layout is turned off and doesn't propagate. I am testing the rules with ghc and they are promising so far. pattern bindings John

On Tue, May 13, 2014 at 11:20:25PM -0700, John Meacham wrote:
Okay, I believe I have come up with a modified version that accepts many more programs and doesn't require complicated comma handling, you can make all decisions based on the top of the context stack. It also allows many useful layouts that were illegal under the old system.
From your description, the rule still sounds quite complex. It depends whether the objective is a rule that is close enough to the current rule that we can switch with very little code breaking, or a rule that is simple to explain to newcomers to the language (but which will require much more code tweaking when updating packages to compile with the new standard). Thanks Ian

Actually, it has less cases than my previous version, I think I just
wasn't presenting it well. My goal is to make something that will
accept current programs out there _and_ be much simpler than the
current rule. The parse exception brings a huge amount of complexity
into the old rule. LALR parsers are notoriously tricky for even
experienced programmers.
So, compared to the H98 rule the major simplification is:
Rather than extend layout until the language parser encounters an
error, I proactively disable layout when inserting a semi or brace
would guarentee a parse error under a simple parsing algorithm that
only keeps track of 'nesting level' and nothing else.
So, we basically just keep track of when inserting a semi or close
brace would cause an error of the form '(' with no matching ')' found
and don't do it then.
which state your automata goes to next depends on simple pattern
matching on current symbol and current top of stack and can be listed
as a set of tuples.
in other words, a textbook deterministic push down automaton.
John
On Wed, May 14, 2014 at 7:29 AM, Ian Lynagh
On Tue, May 13, 2014 at 11:20:25PM -0700, John Meacham wrote:
Okay, I believe I have come up with a modified version that accepts many more programs and doesn't require complicated comma handling, you can make all decisions based on the top of the context stack. It also allows many useful layouts that were illegal under the old system.
From your description, the rule still sounds quite complex. It depends whether the objective is a rule that is close enough to the current rule that we can switch with very little code breaking, or a rule that is simple to explain to newcomers to the language (but which will require much more code tweaking when updating packages to compile with the new standard).
Thanks Ian
-- John Meacham - http://notanumber.net/

After some work, I have replaced jhcs front end with one using an AlternateLayoutRule and it passes ghc's parsing regression tests. I had to make some modifications so the code so it isn't as pretty as I'd like, I'll see if I can distill the essence out into something concrete. Incidentally, I found a particularly troublesome case for the current ghc rule out in the wild during my testing: -- Lining up |'s and indenting new blocks 4 spaces. A fairly common practice. bar x y = case x of Just z | y > z*2 -> case y of _ -> y | y > z -> y*10 _ -> 1 bar (Just 2) 3 => 30 --add a constant true guard that should be a nop bar x y = case x of Just z | y > z*2 -> case y of _ | True -> y <=== added a no-op guard | y > z -> y*10 _ -> 1 bar (Just 2) 3 => 1 The pattern match was fairly complicated making what happened pretty obscure. One of the main benefits I have found is being able to process the token stream between the lexer and parser, which drastically simplified my parser making it almost shift/reduce conflict free. I turn (+) and case into single lexemes and turned reservedids into varids for the keywords of unenabled extensions. John -- John Meacham - http://notanumber.net/
participants (3)
-
Ian Lynagh
-
John Meacham
-
Simon Marlow