
let f = ..f.. in f{n,m} -PEEL-> let f = ..f.. in ..f{n-1,m}..
Probably what you intend here is that you create one copy of the definition every round rather than one per call site, is that right?
I don't think so - ultimately, the point of both peeling and unrolling is to unfold a definition into a use site, to enable further optimizations, not just to move from a recursive to a non-recursive definition. We could try to do it in two steps, as you suggest, but that would expose us to the heuristics of GHC inlining again (will or won't it inline the new shared definition?), in the middle of a user-annotation-based unfolding.
Ah - I was thinking of something a bit different, where:
* PEEL / UNROLL pragmas duplicate the method body once per level of peeling / unrolling and fix up the recursive calls as appropriate * The user optionally adds an INLINE pragma to the function if he additionally wants to be SURE that those duplicates get inlined at the use sites
Ok, I suspected as much. You'd need to make the 'INLINE f' apply to the generated 'fN', of course.
This means that PEEL / UNROLL represent nice logically-orthogonal bits of functionality to INLINE-ing.
Usually, I'm all for orthogonality, and for more knobs to allow hand-tuning of things that have no automatically reachable optimal solutions. In this case, however, I'm not sure anything would be gained. I recall that your hand- unrolled code was written in a similar style, and assumed that it was a question of style, which GHC would inline into the same code. But if you annotate all your unrolled and peeled new definitions as NOINLINE, do you still get the optimizations you want? There are probably a few GHC optimizations that can "look through" non-recursive lets, but RULES are not among those. For loop-style recursion, there'd be only one use per definition, so inlining would be the default and there'd be no difference, but for non-loop-style recursion, inlining might not happen, and so no further optimizations would be enabled. Off the top of my head, I can't think of a case where that would lead to improved code, but as I'm discovering, I'm not very familiar with the details of what optimizations GHC is actually doing (though this is quite helpful: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/HscMain ) so I might be missing something?
Furthermore, I'm not too keen on duplicating method bodies at call sites willy-nilly because it may lead to increased allocations (of the function closures) in inner loops. At least if you bind the duplicated methods at the same level as the thing you are duplicating you only increase the dynamic number of closures created by a constant factor!
Yes, every form of INLINE has its limits. But if users say they want inlining (or peeling or unrolling or any other form of unfolding), that's what they should get, including those worrysome duplications. The idea is to create lots of added code (in order to remove abstractions that might hide optimization opportunities), which will then be simplified to something smaller (or at least better performing) than what we started out with. Providing the means to fine tune the amount of duplications might be useful, but preventing them entirely is not an option. Claus