
Hello Simon, Friday, February 24, 2006, 12:30:22 PM, you wrote: SPJ> | GHC has great high-level optimization. SPJ> However, let me strongly urge you *not* to focus attention primarily on SPJ> the gcc route. Compiling via C has received a lot of attention over the SPJ> years, and there are many papers describing cool hacks for doing so. SPJ> GHC does not do as well as it could. btw, it's well known that Clean sometimes make better code than GHC and sometimes vice versa. now I'm almost sure that GHC outperforms Clean in high-level optimizations and give away on the low-level code generation, so the grand result depends on that is more important for particular algorithm. on the hand-coded programs Clean should almost always outperform GHC SPJ> But there are serious obstacles. That's not gcc's fault -- it wasn't SPJ> designed for this. Accurate GC is one of them, we ca use two stacks - for boxed and for unboxed values SPJ> tail calls is another, nowadays gcc optimize tail calls SPJ> and there are plenty more smaller things that bite you only after you've SPJ> invested a lot of time. This way lies madness. but nevertheless ghc already compiles to gcc and it's the fastest backend, with help of Evil Mangler. afaik, code generated by ghc will work even w/o Evil Mangler, so it's just an optimization hack. can you please say what an optimizations done via EM? i think that part of them can be implemented via changing C generation, so may be even we can omit EM at long last SPJ> C-- was *designed* for this purpose. GHC uses C-- as its intermediate SPJ> language (just before emitting C). So a good route is this: SPJ> * Write C-- to C-- optimisations SPJ> * Then, if you like, translate that code to C. Already you will be SPJ> doing better than GHC does today, because the C-- to C-- optimiser will SPJ> let you generate better C SPJ> * But you can also emit C--, or native code; both of these paths will SPJ> directly benefit from your C-- optimisations. SPJ> The point is that none of this relies on Quick C--; you can always use SPJ> an alternative back end. yes, C-- was designed to solve all our problems - provide world-class optimization, code generation for different cpus and so on. but does it fulfil it's promises? no. and now you propose to write these optimization as part of Haskell project. great idea! i've started from the same idea and even wrote a sequence of optimizations what will transform initial C-- code for "fac" to the efficient one. but then i realized that the core problem is what ghc just generates low-level code in "portable asm" style. and optimizing of low-level code that tries to describe how to IMPLEMENT this algorithm instead of just describe the ALGORITHM itself is a hard task. noone writes optimizers for the ASM language, but you propose to do exact this. instead of coping with current "portable asm" code generated from STG i propose to generate idiomatic C or C-- code and leave its optimization to the appropriate compiler. the SEPARATE problem is what gcc/icc can generate much better code that qc, so i propose to direct efforts of generating idiomatic C[--] toward C compilers SPJ> You can almost do this today. GHC uses C-- as an intermediate language. SPJ> But alas, GHC's code generator does not take advantage of C--'s native SPJ> calls or parameter passing. Instead, it pushes things on an auxiliary SPJ> stack etc. (Those Sp memory operations you see.) This isn't necessary. SPJ> We are planning to split GHC's code generator into two parts SPJ> A) Generate C-- with native calls, with an implicit C-- stack SPJ> B) Perform CPS conversion, to eliminate all calls in favour of SPJ> jumps SPJ> using an explicit stack SPJ> The current code gen is A followed by B. But A is a much more suitable SPJ> optimisation platform, and gives more flexibility. i propose to implement only A and leave B to the C[--] compiler itself. that will require to return to 2-stacks model, but in return we will get 1st-class optimization instead of making itself it's rather restricted subset. instead of permanently trying to catch up C compiler's optimization we can just jump to the common bandwagon :) SPJ> Chris Thompson, and undergrad at Cambridge, is doing (B) as his SPJ> undergrad project, although it remains to be seen whether he'll have SPJ> enough complete to be usable. SPJ> Another shortcoming is that the native code generator in GHC isn't SPJ> capable of dealing with backward jumps to labels (because GHC hasn't SPJ> needed that so far). But if you did C-- optimisation, you'd probably SPJ> generate such jumps. It'd be great to beef up the native code gen to SPJ> handle that. i propose to add if/for/while statements to the CmmStmt datatype to allow generation of higher-level code. as John Meacham wrote several months ago, gcc can't unroll loops written with gotos. so it's anyway better to generate explicit loops. it should be not too hard to generate asm code for these constructs? although anyway generation of high-level C code is not compatible with native asm path, and from some moment this paths will diverge SPJ> Many of the optimisations you describe (perhaps all) are readily SPJ> expressible in the C-- intermediate language, and by working at that SPJ> level you will be independent of with the back end is gcc, a native code SPJ> generator, or Quick C--, or some other C-- compiler. Much better. most of my proposals is of course compatible with C-- path. all proposed changes can be divided to the following parts: 1) on the "Haskell->STG" path, including more aggressive inlining/specialization of functional values, polymorphic code and boxed values 2) in the computation model, for example make quick check that box already contains WHNF 3) generation of high-level C[--] code instead of current low-level one 4) orientation to the gcc optimizations instead of waiting and especially implementing our own ones difference between C-- and gcc ways, imho, is that on the C-- way you should themselves implement optimizations and that range of optimizations you can implement in foreseeable future is much less than already implemented in gcc. on the C way you need only change the method of generating C code - and this should be much easier than implementing even small subset of gcc optimizations and will immediately give us, the ghc users, all the benefits of 1st class optimization techniques available in gcc -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com