RE: inside the GHC code generator

| last days i studied GHC code generator, reading -ddumps of all sorts, | GHC sources, various papers and John's appeals :) | | what i carried from this investigation: | | GHC has great high-level optimization. moreover, GHC now allows to | program STG almost directly. when i look at STG code i don't see | anything that can be done better - at least for these simple loops i | tried to compile. i see just unboxed variables, primitive operations | and simple loops represented as tail recursion. fine. | | then STG compiled to the simplified C (called abstract C earlier and | quasi C-- now), what is next can be translated: | | * to real C and then compiled by gcc | * to assembler by build-in simple C-- compiler | * to assembler by some external C-- compiler (at least it is | theoretically possible) Good stuff Bulat. There's plenty of interesting stuff to be done here. However, let me strongly urge you *not* to focus attention primarily on the gcc route. Compiling via C has received a lot of attention over the years, and there are many papers describing cool hacks for doing so. GHC does not do as well as it could. But there are serious obstacles. That's not gcc's fault -- it wasn't designed for this. Accurate GC is one of them, tail calls is another, and there are plenty more smaller things that bite you only after you've invested a lot of time. This way lies madness. C-- was *designed* for this purpose. GHC uses C-- as its intermediate language (just before emitting C). So a good route is this: * Write C-- to C-- optimisations * Then, if you like, translate that code to C. Already you will be doing better than GHC does today, because the C-- to C-- optimiser will let you generate better C * But you can also emit C--, or native code; both of these paths will directly benefit from your C-- optimisations. The point is that none of this relies on Quick C--; you can always use an alternative back end. You can almost do this today. GHC uses C-- as an intermediate language. But alas, GHC's code generator does not take advantage of C--'s native calls or parameter passing. Instead, it pushes things on an auxiliary stack etc. (Those Sp memory operations you see.) This isn't necessary. We are planning to split GHC's code generator into two parts A) Generate C-- with native calls, with an implicit C-- stack B) Perform CPS conversion, to eliminate all calls in favour of jumps using an explicit stack The current code gen is A followed by B. But A is a much more suitable optimisation platform, and gives more flexibility. Chris Thompson, and undergrad at Cambridge, is doing (B) as his undergrad project, although it remains to be seen whether he'll have enough complete to be usable. Another shortcoming is that the native code generator in GHC isn't capable of dealing with backward jumps to labels (because GHC hasn't needed that so far). But if you did C-- optimisation, you'd probably generate such jumps. It'd be great to beef up the native code gen to handle that. Many of the optimisations you describe (perhaps all) are readily expressible in the C-- intermediate language, and by working at that level you will be independent of with the back end is gcc, a native code generator, or Quick C--, or some other C-- compiler. Much better. Simon

Another shortcoming is that the native code generator in GHC isn't capable of dealing with backward jumps to labels (because GHC hasn't needed that so far). But if you did C-- optimisation, you'd probably generate such jumps. It'd be great to beef up the native code gen to handle that.
I'm already working on that. It's basically done, I think I only need to get around to one more session with the code for final cleanup. (Just to avoid any duplication of effort). Cheers, Wolfgang

But there are serious obstacles. That's not gcc's fault -- it wasn't designed for this. Accurate GC is one of them, tail calls is another, and there are plenty more smaller things that bite you only after you've invested a lot of time. This way lies madness.
What about llvm[1]? It was primary designed to support GCC based C and C++ front-ends but currently it also supports: - Accurate GC - tail calls - output for x86, PowerPC, IA-64, Alpha, SPARC V9, and portable C. - a lot of (program wide and/or profile-guided) analyses and optimizations Are there arguments against llvm? [1] llvm website: http://llvm.cs.uiuc.edu/ Regards, Christof

Hello Simon, Friday, February 24, 2006, 12:30:22 PM, you wrote: SPJ> | GHC has great high-level optimization. SPJ> However, let me strongly urge you *not* to focus attention primarily on SPJ> the gcc route. Compiling via C has received a lot of attention over the SPJ> years, and there are many papers describing cool hacks for doing so. SPJ> GHC does not do as well as it could. btw, it's well known that Clean sometimes make better code than GHC and sometimes vice versa. now I'm almost sure that GHC outperforms Clean in high-level optimizations and give away on the low-level code generation, so the grand result depends on that is more important for particular algorithm. on the hand-coded programs Clean should almost always outperform GHC SPJ> But there are serious obstacles. That's not gcc's fault -- it wasn't SPJ> designed for this. Accurate GC is one of them, we ca use two stacks - for boxed and for unboxed values SPJ> tail calls is another, nowadays gcc optimize tail calls SPJ> and there are plenty more smaller things that bite you only after you've SPJ> invested a lot of time. This way lies madness. but nevertheless ghc already compiles to gcc and it's the fastest backend, with help of Evil Mangler. afaik, code generated by ghc will work even w/o Evil Mangler, so it's just an optimization hack. can you please say what an optimizations done via EM? i think that part of them can be implemented via changing C generation, so may be even we can omit EM at long last SPJ> C-- was *designed* for this purpose. GHC uses C-- as its intermediate SPJ> language (just before emitting C). So a good route is this: SPJ> * Write C-- to C-- optimisations SPJ> * Then, if you like, translate that code to C. Already you will be SPJ> doing better than GHC does today, because the C-- to C-- optimiser will SPJ> let you generate better C SPJ> * But you can also emit C--, or native code; both of these paths will SPJ> directly benefit from your C-- optimisations. SPJ> The point is that none of this relies on Quick C--; you can always use SPJ> an alternative back end. yes, C-- was designed to solve all our problems - provide world-class optimization, code generation for different cpus and so on. but does it fulfil it's promises? no. and now you propose to write these optimization as part of Haskell project. great idea! i've started from the same idea and even wrote a sequence of optimizations what will transform initial C-- code for "fac" to the efficient one. but then i realized that the core problem is what ghc just generates low-level code in "portable asm" style. and optimizing of low-level code that tries to describe how to IMPLEMENT this algorithm instead of just describe the ALGORITHM itself is a hard task. noone writes optimizers for the ASM language, but you propose to do exact this. instead of coping with current "portable asm" code generated from STG i propose to generate idiomatic C or C-- code and leave its optimization to the appropriate compiler. the SEPARATE problem is what gcc/icc can generate much better code that qc, so i propose to direct efforts of generating idiomatic C[--] toward C compilers SPJ> You can almost do this today. GHC uses C-- as an intermediate language. SPJ> But alas, GHC's code generator does not take advantage of C--'s native SPJ> calls or parameter passing. Instead, it pushes things on an auxiliary SPJ> stack etc. (Those Sp memory operations you see.) This isn't necessary. SPJ> We are planning to split GHC's code generator into two parts SPJ> A) Generate C-- with native calls, with an implicit C-- stack SPJ> B) Perform CPS conversion, to eliminate all calls in favour of SPJ> jumps SPJ> using an explicit stack SPJ> The current code gen is A followed by B. But A is a much more suitable SPJ> optimisation platform, and gives more flexibility. i propose to implement only A and leave B to the C[--] compiler itself. that will require to return to 2-stacks model, but in return we will get 1st-class optimization instead of making itself it's rather restricted subset. instead of permanently trying to catch up C compiler's optimization we can just jump to the common bandwagon :) SPJ> Chris Thompson, and undergrad at Cambridge, is doing (B) as his SPJ> undergrad project, although it remains to be seen whether he'll have SPJ> enough complete to be usable. SPJ> Another shortcoming is that the native code generator in GHC isn't SPJ> capable of dealing with backward jumps to labels (because GHC hasn't SPJ> needed that so far). But if you did C-- optimisation, you'd probably SPJ> generate such jumps. It'd be great to beef up the native code gen to SPJ> handle that. i propose to add if/for/while statements to the CmmStmt datatype to allow generation of higher-level code. as John Meacham wrote several months ago, gcc can't unroll loops written with gotos. so it's anyway better to generate explicit loops. it should be not too hard to generate asm code for these constructs? although anyway generation of high-level C code is not compatible with native asm path, and from some moment this paths will diverge SPJ> Many of the optimisations you describe (perhaps all) are readily SPJ> expressible in the C-- intermediate language, and by working at that SPJ> level you will be independent of with the back end is gcc, a native code SPJ> generator, or Quick C--, or some other C-- compiler. Much better. most of my proposals is of course compatible with C-- path. all proposed changes can be divided to the following parts: 1) on the "Haskell->STG" path, including more aggressive inlining/specialization of functional values, polymorphic code and boxed values 2) in the computation model, for example make quick check that box already contains WHNF 3) generation of high-level C[--] code instead of current low-level one 4) orientation to the gcc optimizations instead of waiting and especially implementing our own ones difference between C-- and gcc ways, imho, is that on the C-- way you should themselves implement optimizations and that range of optimizations you can implement in foreseeable future is much less than already implemented in gcc. on the C way you need only change the method of generating C code - and this should be much easier than implementing even small subset of gcc optimizations and will immediately give us, the ghc users, all the benefits of 1st class optimization techniques available in gcc -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello,
SPJ> tail calls is another,
nowadays gcc optimize tail calls
I found this thesis on the subject by Andreas Bauer: "Compilation of Functional Programming Languages using GCC -- Tail Calls" Do you know if it is his work that was incorporated into GCC? I browsed trough some parts of the thesis, and he mentions GHC as one of the motivations for his work (the thesis also seems to be a nice tutorial on the internals of GCC). From this thread it seems that GHC cannot currently take advantage of his work. Is that correct? /Niklas

On 2/24/06, Simon Peyton-Jones
| last days i studied GHC code generator, reading -ddumps of all sorts, | GHC sources, various papers and John's appeals :) | | what i carried from this investigation: | | GHC has great high-level optimization. moreover, GHC now allows to | program STG almost directly. when i look at STG code i don't see | anything that can be done better - at least for these simple loops i | tried to compile. i see just unboxed variables, primitive operations | and simple loops represented as tail recursion. fine. | | then STG compiled to the simplified C (called abstract C earlier and | quasi C-- now), what is next can be translated: | | * to real C and then compiled by gcc | * to assembler by build-in simple C-- compiler | * to assembler by some external C-- compiler (at least it is | theoretically possible)
Good stuff Bulat. There's plenty of interesting stuff to be done here.
However, let me strongly urge you *not* to focus attention primarily on the gcc route. Compiling via C has received a lot of attention over the years, and there are many papers describing cool hacks for doing so. GHC does not do as well as it could.
But there are serious obstacles. That's not gcc's fault -- it wasn't designed for this. Accurate GC is one of them, tail calls is another, and there are plenty more smaller things that bite you only after you've invested a lot of time. This way lies madness.
C-- was *designed* for this purpose. GHC uses C-- as its intermediate language (just before emitting C). So a good route is this:
* Write C-- to C-- optimisations
* Then, if you like, translate that code to C. Already you will be doing better than GHC does today, because the C-- to C-- optimiser will let you generate better C
* But you can also emit C--, or native code; both of these paths will directly benefit from your C-- optimisations.
The point is that none of this relies on Quick C--; you can always use an alternative back end.
You can almost do this today. GHC uses C-- as an intermediate language. But alas, GHC's code generator does not take advantage of C--'s native calls or parameter passing. Instead, it pushes things on an auxiliary stack etc. (Those Sp memory operations you see.) This isn't necessary. We are planning to split GHC's code generator into two parts A) Generate C-- with native calls, with an implicit C-- stack B) Perform CPS conversion, to eliminate all calls in favour of jumps using an explicit stack The current code gen is A followed by B. But A is a much more suitable optimisation platform, and gives more flexibility.
Chris Thompson, and undergrad at Cambridge, is doing (B) as his undergrad project, although it remains to be seen whether he'll have enough complete to be usable.
Another shortcoming is that the native code generator in GHC isn't capable of dealing with backward jumps to labels (because GHC hasn't needed that so far). But if you did C-- optimisation, you'd probably generate such jumps. It'd be great to beef up the native code gen to handle that.
Many of the optimisations you describe (perhaps all) are readily expressible in the C-- intermediate language, and by working at that level you will be independent of with the back end is gcc, a native code generator, or Quick C--, or some other C-- compiler. Much better.
What optimizations are we talking about here? The loop optimizations that Bulat implicitly proposed would only affect recursion over unboxed arguments, and, since that's fairly rare, wouldn't give Joe Hacker any noticeable speed up. Are we at the end of what we can get without whole-program optimizations or are there other optimizations that apply to C-- representing a lazy PL? -- Friendly, Lemmih
participants (6)
-
Bulat Ziganshin
-
Christof Douma
-
Lemmih
-
Niklas Sorensson
-
Simon Peyton-Jones
-
Wolfgang Thaller