
2009/3/19 Claus Reinke
Recursion unfolding spec, 2nd attempt.
....
If this is an improvement on the first version, and after correcting any obvious issues, I should put it on the ghc trac wiki somewhere, and create a feature request ticket.
I can't see any issues with this version of the spec. I think in the implementation it makes most sense to do this as a core2core pass at an early stage in the pipeline, probably via plugins (so will have to wait until I get those into HEAD). In the case of PEEL, we don't want to identify all call sites directly and do the substitution in the pass so we should just output some bindings which will certainly be inlined into call sites later on. So, the transformation should be bottom up on the Core syntax tree and when it meets a recursive group of bindings we should do something like this: {-# INLINE f g PEEL 3 UNROLL 2 #-} f = ... g ... f ... h ... g = ... g ... f ... h ... h = ... g ... f ... h ... =(my pass)=> -- Temporary copies of f and g - dead code f_old = ... g_old ... f_old ... h ... g_old = ... g_old ... f_old ... h ... -- H unchanged for now, might get PEELed stuff inlined later h = ... g .. f ... h ... -- Top level unrolled definiiton - if we weren't doing peeling, these would be the new f and g f_unrolled = ... g_unrolled_1 ... f_unrolled_1 ... h ... g_unrolled = ... g_unrolled_1 ... f_unrolled_1 ... h ... -- Unrolled iteration. Will get inlined into f_unrolled / g_unrolled soon {-# INLINE f_unrolled_1 g_unrolled_1 #-} f_unrolled_1 = ... g_unrolled ... f_unrolled ... h ... g_unrolled_1 = ... g_unrolled ... f_unrolled ... h ... -- One level of peeling {-# INLINE f_1 g_1 #-} f_1 = ... g_unrolled ... f_unrolled ... h ... g_1 = ... g_unrolled ... f_unrolled ... h ... -- Second level of peeling {-# INLINE f_2 g_2 #-} f_2 = ... g_1 ... f_1 ... h ... g_2 = ... g_1 ... f_1 ... h ... -- Final level of peeling and new definitions for f and g. Inline pragmas -- make sure all of this gets inlined at the call site {-# INLINE f g #-} f = ... g_2 ... f_2 ... h ... g = ... g_2 ... f_2 ... h ... =(after the simplifier has run - effectively - there are a few harmless lies here)=> -- NB: I haven't shown inlining of the new f and g here, but it /will/ happen h = ... g .. f ... h ... -- I've inlined the inner unrolled iteration at every /call site/ within the top level unrolled iteration, as per -- the pragmas. Noone actualy calls this unrolled thing directly though, since we used PEEL as well f_unrolled = ... (... g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ... f_unrolled ... h ...) ... h ... g_unrolled = ... (... g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ... f_unrolled ... h ...) ... h ... -- This huge chunk of code gets inlined at every call site, which in turn call through to the unrolled bodies {-# INLINE f g #-} f = ... (... (... g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ... f_unrolled ... h ...) ... h ...) ... (... (... g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ... f_unrolled ... h ...) ... h ...) ... h ... g = ... (... (... g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ... f_unrolled ... h ...) ... h ...) ... (... (... g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ... f_unrolled ... h ...) ... h ...) ... h ... By ensuring that f and g are tagged INLINE we get the existing INLINE restrictions automatically in later Core passes. I think that this example transformation matches your spec - am I right? Cheers, Max