
Harendra Kumar
My earlier experiment was on GHC-7.10.3. I repeated this on GHC-8.0.1 and the assembly traced was exactly the same except for a marginal improvement. The 8.0.1 code generator removed the r14/r11 swap but the rest of the register ring shift remains the same. I have updated the github gist with the 8.0.1 trace:
Have you tried compiling with -fregs-graph [1] (the graph-coloring allocator)? By default GHC uses a very naive linear register allocator which I'd imagine may produce these sorts of results. At some point there was an effort to make -fregs-graph the default (see #2790) but it is unfortunately quite slow despite having a relatively small impact on produced-code quality in most cases. However, in your case it may be worth enabling. Note, however, that the graph coloring allocator has a few quirks of its own (see #8657 and #7697). It actually came to my attention while researching this that the -fregs-graph flag is currently silently ignored [2]. Unfortunately this means you'll need to build a new compiler if you want to try using it. Simon Marlow: If we really want to disable this option we should at very least issue an error when the user requests it. However, really it seems to me like we shouldn't disable it at all; why not just allow the user to use it and add a note to the documentation stating that the graph coloring allocator may fail with some programs and if it breaks the user gets to keep both pieces? All-in-all, the graph coloring allocator is in great need of some love; Harendra, perhaps you'd like to have a try at dusting it off and perhaps look into why it regresses in compiler performance? It would be great if we could use it by default. Cheers, - Ben [1] http://downloads.haskell.org/~ghc/master/users-guide//using-optimisation.htm... [2] https://git.haskell.org/ghc.git/commitdiff/f0a7261a39bd1a8c5217fecba56c593c3...