
2009/3/1 Claus Reinke
What is the issue with concatMap?
ConcatMap doesn't usually fuse under stream fusion - see http://www.cse.unsw.edu.au/~dons/papers/stream-fusion.pdf for the gory details.
It sounds like your specialization is based on the recursion equivalent to loop peeling (unrolling the "non-recursive" calls, the entry points into the recursion), a close variant of loop unrolling (unrolling the recursive calls, inside the loop body).
This sounds right - to get concatMap specialised I "unpeel" the unstream loop (which has been slightly modified) 4 iterations, after which unstream recurses back into itself in a tight loop. This lets GHC specialise the first 3 iterations however it likes. This is achieved by the spec4 combinator in the code I posted.
If followed by the static argument transformation, that might cover the majority of hand-written worker-wrapper pairs (replacing manual by compiler optimization is always nice).
Right. Since GHC is so blind to recursion at the moment this could be a substantial win (though my gut tells me that SAT alone is a large part of the win here).
So, instead of splitting recursive 'f' into 'fWorker' and 'fWrapper', with an INLINE pragma for 'fWrapper', one might in future be able just to say '{-# INLINE f PEEL 1 UNROLL 0 #-}' or, if unrolling is also desirable '{-# INLINE f PEEL 1 UNROLL 8 #-}'? And GHC would do the work, by unfolding the entry points once (the inlining of the wrapper), unfolding the recursive calls 8 times (the loop unrolling), and taking the INLINE PEEL pragma also as a hint to try the static argument transformation.
Right, and INLINE PEEL might be a nice interface for the user. Of course, we'd probably want an automated system for working out when this is a good idea as well - in the same way that we have INLINE pragmas and a load of inlining heuristics.
It seems that strength reduction could be seen as loop restructuring in the small: a multiplication, if seen as a repeated addition, can be unrolled, or the implicit adding loop can be fused with the explicit loop in which multiplication is called on (eg figure 7 in the ACM survey paper I mentioned).
I hadn't thought about it in quite those terms before - cute :-)
That way, no separate framework would be needed for strength reduction. Btw, is that also what happened in the -fvia-C path of Don's initial example in this thread (I don't know how to read those leaqs, but the imulq is gone)?
I am no assembly guru and haven't seen that last form of leaq either, but I'm going to guess that: leaq (%rsi,%rsi,4), %rax Says that rax is rsi * ((1 + 1) * 2 ^ 4) = rsi * 32 leaq 0(,%rax,8), %rsi And that this finishes it off by adding the final 8* to the mix. So it makes the multiplication easier by breaking it into two multiplications by powers of two. Smart, but you don't need any loop unrolling tech to do it.
But all these follow-on optimizations enabled by unfolding recursive definitions seem to require further thought and design, whereas user-controlled recursion unfolding (both peel and unroll) seems to offer immediate benefits. Is that part of your current work?
I hadn't actually considered a mechanism user-controlled peel/unroll at all! I was totally focused on automatic transformations :-)
Do you forsee any problems with the implementation, or with the API I suggested above (adding PEEL and UNROLL options to INLINE pragmas, to make them effective on recursive definitions as well)?
The implementation I'm thinking of is basically trivial. You just add the information gathered from the pragmas onto the Ids, then have a dedicated core pass that looks at the pragmas and does it's worker/wrapper thing. The technology to do peeling/unrolling is trivial and there already examples in the codebase (in case liberation and SAT). If someone can spec out what they actually want and GHC HQ give it the thumbs up I would be happy to do the grunt work on implementing this feature. I'm not so sure about the user interface - for the purposes of compatibility with other compiler's notion of INLINE perhaps a dedicated PEEL / UNROLL pragma is a good idea. This would be less painful if we had a positional notation for pragmas - which has been mooted in the past wrt. the annotation system for compiler plugins (which IS in HEAD). AFAIK the only reason we don't have this is that we haven't had a discussion about how it should look. See "Future work" on the page http://hackage.haskell.org/trac/ghc/wiki/Annotations Incidentally, Simon PJ has just made GHC warn about INLINE pragmas on recursive things (not something I totally sure is a good idea, since the compiler can make things non-recursive behind your back) but which you can justify by saying that /normally/ GHC won't INLINE recursive things, so it's misleading to have INLINE pragmas on them accepted. This can be taken as an argument against adding PEEL / UNROLL to INLINE. Cheers, Max