
In the recently burried haskell-cafe thread "speed: ghc vs gcc", Bulat pointed out some of the optimizations that GHC doesn't do, such as loop unrolling. I suggested a way of experimenting with loop unrolling, using template haskell to bypass GHC's blindspot (it usually doesn't unfold recursive definitions http://www.haskell.org/pipermail/glasgow-haskell-users/2007-July/012936.html , but if we unfold a loop combinator at compile time, GHC's normal optimizations can take over from there): http://www.haskell.org/pipermail/haskell-cafe/2009-February/056241.html While this is fine as far as it goes (it should really be handled within GHC), and does offer some initial speedup, Bulat pointed out that GCC does further optimizations after unrolling, such as reassociating sums to expose potential for constant folding: http://www.haskell.org/pipermail/haskell-cafe/2009-February/056367.html (since the ghc -ddump-simpl output doesn't show this optimization, I assume that gcc handles it, and the "*ghc*" in that message is a typo, but haven't checked - how would I do that, btw?). In this case, GHC optimizations following the loop unrolling leave a sum like (note the repeated variable interspersed with constants) (GHC.Prim.+# (GHC.Prim.+# ww_s1lN 3) (GHC.Prim.+# (GHC.Prim.+# ww_s1lN 2) (GHC.Prim.+# (GHC.Prim.+# ww_s1lN 1) (GHC.Prim.+# (GHC.Prim.+# ww_s1lN 0) ww_s1lR)))))))) which can be simplified (assuming associativity and commutativity of + here..) after sorting the variable references and constants into separate groups. We currently inherit such optimizations when using -fvia-C, even though GHC sometimes produces C code that GCC can't handle optimally. If I understand correctly, -fvia-C is on its way out - is that correct, and what plans are there for recovering the optimizations previously left to GCC? The next thing I was looking at was rewrite rules, the obvious GHC tool for implementing this kind of rule (var+const1)+(var+const2) ==> 2*var + const3 and I ran into more questions: - can RULES left-hand sides test for variables (I don't want to reassociate sums randomly, that wouldn't terminate; instead, I want to float out subterms that are non-variable, and group repeated variables)? - is there any way to control the rewrite strategy, similar to strategy combinators (if rules are applied all over the place, they introduce new structure not covered by rules; if I could limit the strategy to top-down, or bottom-up, I could at least cover some special cases)? - how would one handle this kind of optimization in GHC in full generality? wait for compiler plugins? are there features of rewrite rules that I'm missing? would it make sense to flag rewrite rules system improvements as a GHC GSoC project, given that GHC will have to pull its weight there when moving away from GCC? Claus