RE: jhc vs ghc and the surprising result involving ghc generatedassembly.

All this interesting talk of back ends is somewhat topical at GHC HQ....It may well be that we can share some technology here. GHC uses C-- as its intermediate representation; Core ==> C-- either==> assembly code or==> C Currently the Core->C-- step is done in one big go, by the code generator. The C-- code uses NO native C-- calls; instead, it uses tail calls only, and builds a totally explicit stack as a data structure in the heap. But Chris Thompson-Walsh, as his undergrad project, is writing a CPS converter for C--, so we can do this Core ==> C-- ==> C-- with tail calls only The first step is code generation, converting lambda code to C--, but now this is C-- with native C-- calls. The stack is implicit, entirely under C--'s control. The second step is the CPS conversion. After the CPS conversion the stack is totally explicit. One reason for doing all this is that it factors the rather-complicated code generator into two pieces. The second reason is that if we had a full-blown C-- compiler available, supporting the full runtime interface, we could print out the C-- before the CPS conversion, and use the C-- compiler for the rest. Anyway, apart from the first code-gen step, not much of this process is particularly GHC specific. At the moment it's entirely implemented in Haskell. One of the things Chris is going to look at (if he has time) is packaging the Cmm data type, its various useful functions (e.g. data flow analysis), and his CPS converter, as a Cabal package that GHC just uses. Then other people could use it too. Furthermore the Cmm data type would be the perfect place to spot self-tail calls, eliminate memory operations etc -- all the things that John has been mentioning in his message. That is, a C-- ==> C-- optimiser would be a nice independent project. My reason for mentioning all this now, when it's still partly vapourware, is to see if anyone else is interested. In particular, perhaps a C-- optimiser for jhc would be usable in GHC and vice versa. Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users- | bounces@haskell.org] On Behalf Of John Meacham | Sent: 27 October 2005 00:37 | To: glasgow-haskell-users@haskell.org | Subject: Re: jhc vs ghc and the surprising result involving ghc generatedassembly. | | On Wed, Oct 26, 2005 at 12:24:14PM -0400, Jan-Willem Maessen wrote: | > Nice analysis. I indeed found with phc that shadow stack references | > absolutely killed performance, and I aggressively cached stack | > locations in locals, spilling to stack only when GC information | > needed to be accurate. [There was a giant infrastructure to save | > only live data to stack, but we won't go into that now as it was the | > source of almost all the codegen bugs...] | | phc? there is another haskell compiler out there? | | > This makes a big difference. The phc compiler even put comments in | > the code so that I could figure out what came from where. | | yeah, that is something I would like to add. Unfortunatly I wasn't | forward thinking enough to put annotation points everywhere I might | eventually need them so I will have to go through and do that at some | point :) | | > I'm impressed that gcc found this. It's definitely living a bit | > dangerously, and your suggestions below for self tail call handling | > are the ones I found most effective. (They also allowed me to bypass | > some prologue garbage, since phc used a one-C-function-per-Haskell- | > function model with internal resumption points.) Non-self tail calls | > I was careful to compile to: | > return f(...); | > I expect from the above that gcc does better at spotting tail calls now. | [..rest cut..]

Hello Simon, Thursday, October 27, 2005, 12:40:27 PM, you wrote: SPJ> That is, a C-- ==> C-- optimiser would be a nice independent project. SPJ> My reason for mentioning all this now, when it's still partly SPJ> vapourware, is to see if anyone else is interested. In particular, SPJ> perhaps a C-- optimiser for jhc would be usable in GHC and vice versa. i'm interested in writing this project -- Best regards, Bulat mailto:bulatz@HotPOP.com

I have this new thing where I try to write down my ideas before I lose them. so far I think it is going well. this is also available at http://repetae.net/john/repos/jhc/docs/c-minus-monad.txt ==== Sorry if this is a bit raw or wrong, it is an idea congealing as it is being written (asynchronously, of course). feel free to skip to the bullet points at the end for the short version as it is rather long. I have recently been thinking about writing a c-- backend for jhc, printing out and reading the various c-- papers in preparation. As I expected, the translation would be quite simple and straightforward, but I found the section on continuations and cut to particulary inspiring. Adding them to Grin (graph-reduction-intermediate-form, the last form of jhc programs before code generation) would give me exactly the features I have been missing and having to do in an ad-hoc manner in the code generator or do without, exceptions, join points, loops. I formulated how to add them to Grin, verified it still followed the monad laws and think I might have come up with something that is not only practically useful to me now, but has implications for c-- optimizers and implementations in general which is what this message is about. I will be careful to use 'register' to mean a modifyable variable like an auto in C or a local register in c--. I will use 'name' to refer to something bound by a lambda expression on the right side of a monadic bind operator, 'value' to refer to some specific result like a number or memory address, and I will use 'variable' when I am _intentionally_ being ambiguous for reasons that will become clear. jhcs grin is defined exactly so: infixr 1 :->, :>>= data Lam = Val :-> Exp data Exp = Exp :>>= Lam | App { expFunction :: Atom, expArgs :: [Val], expType :: Ty } | Prim { expPrimitive :: Primitive, expArgs :: [Val] } | Case { expValue :: Val, expAlts :: [Lam] } | Return { expValue :: Val } | Store { expValue :: Val } | Fetch { expAddress :: Val } | Update { expAddress :: Val, expValue :: Val } | Error { expError :: String, expType :: Ty } | Cast { expValue :: Val, expType :: Ty } data Val = Lit !Number Ty | Var !Var Ty | Tup [Val] . . notice a few things, (:>>=) and Return form a Monad. their names arn't a coincedence. (:->) plays the exact same role as (->) in haskell lambda expressions and case statements. Grin is not higher order, the only place a Lam may occur is on the right side of the monadic bind or as a binding in a case statement. Grin is strict, all evaluation is explicit. Grin is explicitly typed (we will ignore this for now). Nothing may be partially applied, a complete Lambda _must_ appear to the right of a bind. this is statically enforced. by declaring a handy fixity, this lets us write Grin code in haskell that strongly resembles haskell monadic code. the standard accumulating factorial is n = Var (V 1) r = Var (V 2) x = Var (V 3) y = Var (V 4) any = Var (V 0) g_fact = function "fact" $ Tup [n,r] :-> Case n [ 1 :-> Return r, any :-> App "minus" [n,1] :>>= x :-> App "times" [n,r] :>>= y :-> App "fact" [x,y] ] in any case, the point being, Grin is a monad, all the monad transformations apply, many optimizing transformations are possible and provably correct because of these laws. Many examples of good optimizations on this form were explored by boquist. now lets add continuations data Exp = Exp :>>= Lam | MkContinuation Lam | CutTo Val [Val] ... data Val = | Cont Cont -- ^ some abstract representation of an instantiated continuation. _NOT_ a Lam. ... the meaning is straightforward and have the same interpretation as them in C-- for the cases where C-- allows. all names bound before the continuation are available within its body, CutTo never returns, a continuation may take arguments. note that unlike c-- a continuation's Name is bound just like any other and does not have special syntax: MkContinuation ( x :-> .. ) :>>= myCont :-> .... now we can express join points, continuations, and all sorts of good stuff.. (from now on I will use haskell do-notation, but keep in mind we are building up an AST with :>>= and :->, not executing anything) for example, a join point: myfunc = \ (a,b) -> do -- ^ note, tupled, no currying cont <- mkContinuation $ \ x -> do return (x + 4) case a - b of 3 -> return -1 5 -> cutTo (cont 2) z -> cutTo (cont z) yay! Grin has gained a whole lot of power from stealing this idea from c--. There is one gotcha though that is sort of hacky, but can be thought of as syntatic sugar. how do you create mutually recursive continuations when it's name is only available in the lambda to the right of the declaring monadic action? The brute force approach would be to pass all the continuations names as arguments to every continuation, seeded from one called at the very end when all names are available. of course, this is untenable. the answer is that our monad is actually a member of MonadFix and fixpointable. In theory we could add a primitive (FixIt Lam) that computes the fixpoint, but that would be quite a hassle and noticing that the only things that can be passed as arguments to the fixed function are continuations, we rather choose a convinient hack. every code block is implicitly wrapped in a fixpoint operator that passes all continuations back into the top under the _first_ name they were bound to. code blocks are exactly the bodies of lambdas. i.e. function bodies and arms of case statements. note that this is exactly what we would expect, lambdas are the only 'fixable' things so every lambda is inherently fixed. it is a little hacky for a number of reasons, mainly the transformation to explicit fixpoint form is complicated by things like shadowing and cut-tos, but it would be straightforward if you wanted to do it for some reason. Doing continuations this way rather than declaring a special syntax or just arbitrarily saying they scope over everything is important to ensure we don't lose our precious monadic properties and 'fixable' monads have been well studied and are used in practice in Haskell. Grin is starting to become very expressive, but can we make it as expressive as c--? = Cmf - C minus minus monadic form = This is the meat, after reducing grin more and more I realized I had come up with something much more useful than my original intent. Cmf means 'C minus minus monadic form' let's disect and simplify grin some: data Lam = Val :-> Exp data Exp = Exp :>>= Lam | App String [Val] | Case Val [Lam] | Return Val | MkContinuation Lam | CutTo Val [Val] -- store is nothing more than a call to 'malloc' and a couple 'pokes' into ram so we can discard it. | Store { expValue :: Val } -- fetch is nothing more than a couple peeks into ram so we can discard it | Fetch { expAddress :: Val } -- likewise, but with pokes | Update { expAddress :: Val, expValue :: Val } -- error we can simulate with a 'cut to' an error routine | Error { expError :: String, expType :: Ty } -- unneeded since we are ignoring types for now | Cast { expValue :: Val, expType :: Ty } -- now lets add some primitives for convinience | Peek Val -- read from memory | Poke Val Val -- write to memory data Val = Lit Number | Nam Name -- ^ note it is a 'Name' now. | Tup [Val] so far our Cmf looks good, nice and simple. lets add a couple more primitives and call it done. data Lam = Val :-> Exp data Exp = Exp :>>= Lam | App String [Val] | Case Val [Lam] | Return Val | MkContinuation Lam | CutTo Val [Val] | Peek Val -- read from memory | Poke Val Val -- write to memory | NewReg Val -- allocate a 'register' (in the c-- sense, not the hardware sense) | WriteReg Val Val -- write to register | ReadReg Val -- read from register | JumpTo String [Val] -- might as well make this explicit data Val = Lit Number | Nam Name | Tup [Val] === The Key === This is the key point, registers and names are completly distint in Cmf. Boquist's Grin and SSA form all have 'variables' that don't vary, but must eventually be mapped to varying registers, they are meant to turn into registers via some external register allocation algorithm which presumably will also turn tail calls into loops and parameters into local varying registers. these transformations must necessarily happen outside of Grin, since there is no way to express the fact you want to 'reuse' say eax later or update a value each time through a loop. let's look at how this helps us. take the factorial example from above as Grin would leave it in its final form (with an explicit tailcall): fact = \ (n,r) -> do case n of 1 -> return r _ -> do x <- n - 1 y <- n * r jumpTo (fact (x,y)) This certainly compiles to decent C code, but from now on we have to give up our nice monadic form, there is no further work you can do in Grin. Assuming instead it is in Cmf, we can apply a number of simple transformations. the first is that any explicit tailcall to oneself can be turned into a continuation fact = \ (n,r) -> do fact' <- mkContinuation $ \ (n',r') -> do case n' of 1 -> return r' _ -> do x <- n' - 1 y <- n' * r' cutTo (fact' (x,y)) cutTo fact' (n,r) now it is closer, that looks a whole lot more like a loop, but the code generator will still have to have magic in it to determine those parameters passed back into it can actually be placed in varying registers now here is a transformation I call 'register-introduction' (this is distinct from boquists use of the term, as he just meant creating more variables) register introduction (in one of its forms) goes like so cutTo foo (y1,y2,...) foo <- mkContinuation \ (x1,x2,...) -> do .... cutTo foo (z1,z2,...) ===> r_x1 <- newReg y1 cutTo foo (y2,...) foo <- mkContinuation \ (x2,...) -> do x1 <- readReg r_x1 .... writeReg r_x1 z1 cutTo foo (z2,...) this is just one of the forms of register introduction, but they all follow the same pattern preceding uses of something with a read and its binding with a write and getting rid of the name. some of the cases are tricky, this one is simple because foo is called only once with the same arguments, and there are some cases where they can't be fully applied (which means something interesting :)) . so. applying it to our factorial twice fact = \ (n,r) -> do r_n <- newReg n r_r <- newReg r fact' <- mkContinuation $ \ () -> do n' <- readReg r_n r' <- readReg r_r case n' of 1 -> return r' _ -> do x <- n' - 1 y <- n' * r' writeReg r_n x writeReg r_r y cutTo (fact' ()) cutTo fact' () and look at this! our continuation no longer has any arguments, we have transformed them all away, and this means _ The continuations that take no arguments are exactly those that can be turned into gotos _ ah! so now not only do we have the power of c-- gotos, we have made the imperative loop explicit! the fact that we are storing r and n in updatable registers and we have a simple loop is right there in the monadic Cmf :) this is a non-trivial optimization, carried out by simple meaning preserving source->source transformations. This form is perfect for transforming to C or C-- as is. The newRegs turn into local register declarations and the continuation turns into a label and a goto or better yet a while loop. we could stop here... but lets see what happens if we keep going... in a production system, Jhc stops here and the C-- compiler takes over from now on.. now lets take it a step or two further: lets do some simple code motion to clean things up some and replace some primitives with some lower level ones: fact = \ (n,r) -> do r_n <- newReg n r_r <- newReg r cutTo fact' () -- a cutTo right before what it is cutting to can just fall through. fact' <- mkContinuation $ \ () -> do n' <- readReg r_n case n' of 1 -> readReg r_r -- lets move the reading and writing of registers as close as we can to their uses. _ -> do n' <- readReg r_n x <- n' - 1 writeReg r_n x r' <- readReg r_r n' <- readReg r_n y <- r' * n' writeReg r_r y cutTo (fact' ()) now lets replace some of the primitives with some more "concrete" ones :) fact = \ (n,r) -> do r_n <- newReg n r_r <- newReg r cutTo fact' () fact' <- mkContinuation $ \ () -> do n' <- readReg r_n case n' of 1 -> readReg r_r _ -> do "subl" [r_n,1] -- n' <- readReg r_n (what it is replacing) -- x <- n' - 1 -- writeReg r_n x "imul" [r_r,r_n] -- n' <- readReg r_n -- r' <- readReg r_r -- y <- n' * r' -- writeReg r_r y cutTo (fact' ()) heh, notice those are i386 opcodes? now, lets define a few global names. eax = Nam (name of eax) ebx = Nam (name of ebx) ecx = Nam (name of ecx) edx = Nam (name of edx) and lets perform some substitutions fact = \ (n,r) -> do writeReg eax n writeReg ebx r cutTo fact' () fact' <- mkContinuation $ \ () -> do n' <- readReg eax case n' of 1 -> readReg ebx _ -> do "subl" [eax,1] "imul" [ebx,eax] cutTo (fact' ()) we can go further and replace the case statement with a special 'If' that takes a bool and translates to a conditional jump, getting rid of the n' but notice what was done, all the monadic names went away. (or will, once the case is transformed away, and the arguments are poped off the stack rather than being bound) so, the whole point of all these transformations is that eventually _every name disapears_ All monadic bindings turn from (>>=) into (>>) forms. names do not corespond to registers, monadically bound names are simply used to _route_ where results go. x <- return 3 f(x) does not mean "bind x to three and then pass it to f" it means "route 3 to the first argument of f" after transformation (assuming c calling convention) "push" 3 "call" [f] "pop" similarly the arguments to functions will disapear and instead be replaced with whatever is appropriate to get the values as specified by your calling convention. so we went from a very high level monad to assembly. without ever leaving the monadic framework. the key reason we were able to do this is the seperation of monadically bound names from run-time values and registers. once all the names are transformed away, all we have left is executable assembly statements seperated by the monadic (>>) operator. = the other way = I meant to talk more about this, but basically the idea is we can start with c-- and transform each local register into a 'newReg' statement and gotos into continuations with no arguments, and then apply a simple set of rewriting rules to _get rid_ of every register access, returning us to a pure functional form that can be optimized! There is a coorespondence between Cmf (monads in general), SSA, and CPS that opens up a wealth of optimizations. the ability to go the other direction is what I find very intriguing, we can take c--, recover its monadic form, optimize the heck out of it, and transform it back to c--. Bullet points: * C-- can be formulated as a true monad that captures most (all?) of its features. * non-trivial optimizations usually done in an ad-hoc or specialized manner can be expressed as straightforward rewriting rules on this monadic code. * it is suitable for transforming between very high-level (jhc or ghc core) all the way down to the very low level (through register assignment and instruction selection) while retaining all its theoretical properties. * this transformation is not a one-way ratchet, higher level forms can be reconstructed from lower level forms by applying rewriting rules in reverse. This would be particularly useful in a c-- to c-- optimizer. * the seperation of the concepts of names used to route where values go, the actual registers that hold them, and the values themselves is a natural and useful distinction and is what allows Cmf to remain valid all the way to instruction selection and the reversability of this transformation. * Monads are good. * Some of this might be novel. John -- John Meacham - ⑆repetae.net⑆john⑈

Hi, In my master thesis I use GRIN and C-- to writing a backend to ehc (essential haskell compiler - http://www.cs.uu.nl/wiki/Ehc/) It is nice to see GRIN is used by other people as well. My work has been to extend the GRIN model to be usefull for exceptions and implement a grin compiler. I am currently in the process of writing the results down. For those not familiar to GRIN some lines on its ideas: - Explicit behaviour (no buildin magic functions) - Thunks and values are represented by a node (A tag followed with zero or more fields) - Do a global analysis to find an approximation of which nodes are stored in memory and the values that variables hold. - Use this approximation to remove any unknown function call and remove unneeded case alternatives [snip]
I have recently been thinking about writing a c-- backend for jhc, printing out and reading the various c-- papers in preparation. As I expected, the translation would be quite simple and straightforward, but I found the section on continuations and cut to particulary inspiring. Adding them to Grin (graph-reduction-intermediate-form, the last form of jhc programs before code generation) would give me exactly the features I have been missing and having to do in an ad-hoc manner in the code generator or do without, exceptions, join points, loops. I formulated how to add them to Grin, verified it still followed the monad laws and think I might have come up with something that is not only practically useful to me now, but has implications for c-- optimizers and implementations in general which is what this message is about.
for example, a join point:
myfunc = \ (a,b) -> do -- ^ note, tupled, no currying cont <- mkContinuation $ \ x -> do return (x + 4) case a - b of 3 -> return -1 5 -> cutTo (cont 2) z -> cutTo (cont z)
Continuations in this example are not needed. The example calls some function by using continuations. The idea of GRIN is to have nodes which represent thunks. A partial application to some function foo can be represented by a node which describes which function to be called when fully saturated and the already applied arguments. The above example can be rewritten to standard GRIN as follows: foo x = return (x + 4) myfunc = \(a,b) -> do cont <- return (P1foo) //store a partial application node case (a - b) of 3 -> return -1 5 -> apply cont 2 z -> apply cont z apply f a = case f of P1foo -> foo a
yay! Grin has gained a whole lot of power from stealing this idea from c--. yay! Grin already has a whole lot of power. ;-)
Secondly. Modelling exceptions in GRIN does not benefit from the use of these continuation constructs. GRIN does not have some global variable in which you can store the exception handler. Of course this can also be added (as a primitive in the monad). But what is the difference to a global variable with a node in it? The extension I created allows exceptions in GRIN with the use of try catch and throw statements. These two are monad primitives. The try catch statments defines a continuation and pushes it on the exception handler stack. The throw statment cuts to that handler. Leaving the scope of the try statement or entering the handler will restore the previous handler. [snip] Doing register allocation and instruction selection in GRIN sure looks nice. It is an intresting thought. Christof
participants (4)
-
Bulat Ziganshin
-
Christof Douma
-
John Meacham
-
Simon Peyton-Jones