Re: Loop unrolling + fusion ?

19 Mar 2009

      2009/3/19 Claus Reinke :
...
Recursion unfolding spec, 2nd attempt.
....
If this is an improvement on the first version, and after correcting any
obvious issues, I should put it on the ghc trac wiki somewhere, and create a
feature request ticket.
I can't see any issues with this version of the spec.

I think in the implementation it makes most sense to do this as a
core2core pass at an early stage in the pipeline, probably via plugins
(so will have to wait until I get those into HEAD). In the case of
PEEL, we don't want to identify all call sites directly and do the
substitution in the pass so we should just output some bindings which
will certainly be inlined into call sites later on. So, the
transformation should be bottom up on the Core syntax tree and when it
meets a recursive group of bindings we should do something like this:

{-# INLINE f g PEEL 3 UNROLL 2 #-}
f = ... g ... f ... h ...
g = ... g ... f ... h ...
h = ... g ... f ... h ...

=(my pass)=>

-- Temporary copies of f and g - dead code
f_old = ... g_old ... f_old ... h ...
g_old = ... g_old ... f_old ... h ...
-- H unchanged for now, might get PEELed stuff inlined later
h = ... g .. f ... h ...

-- Top level unrolled definiiton - if we weren't doing peeling, these
would be the new f and g
f_unrolled = ... g_unrolled_1 ... f_unrolled_1 ... h ...
g_unrolled = ... g_unrolled_1 ... f_unrolled_1 ... h ...

-- Unrolled iteration. Will get inlined into f_unrolled / g_unrolled soon
{-# INLINE f_unrolled_1 g_unrolled_1 #-}
f_unrolled_1 = ... g_unrolled ... f_unrolled ... h ...
g_unrolled_1 = ... g_unrolled ... f_unrolled ... h ...

-- One level of peeling
{-# INLINE f_1 g_1 #-}
f_1 = ... g_unrolled ... f_unrolled ... h ...
g_1 = ... g_unrolled ... f_unrolled ... h ...

-- Second level of peeling
{-# INLINE f_2 g_2 #-}
f_2 = ... g_1 ... f_1 ... h ...
g_2 = ... g_1 ... f_1 ... h ...

-- Final level of peeling and new definitions for f and g. Inline pragmas
-- make sure all of this gets inlined at the call site
{-# INLINE f g #-}
f = ... g_2 ... f_2 ... h ...
g = ... g_2 ... f_2 ... h ...

=(after the simplifier has run - effectively - there are a few
harmless lies here)=>

-- NB: I haven't shown inlining of the new f and g here, but it /will/ happen
h = ... g .. f ... h ...

-- I've inlined the inner unrolled iteration at every /call site/
within the top level unrolled iteration, as per
-- the pragmas. Noone actualy calls this unrolled thing directly
though, since we used PEEL as well
f_unrolled = ... (... g_unrolled ... f_unrolled ... h ...) ... (...
g_unrolled ... f_unrolled ... h ...) ... h ...
g_unrolled = ... (... g_unrolled ... f_unrolled ... h ...) ... (...
g_unrolled ... f_unrolled ... h ...) ... h ...

-- This huge chunk of code gets inlined at every call site, which in
turn call through to the unrolled bodies
{-# INLINE f g #-}
f = ... (... (... g_unrolled ... f_unrolled ... h ...) ... (...
g_unrolled ... f_unrolled ... h ...) ... h ...) ... (... (...
g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ...
f_unrolled ... h ...) ... h ...) ... h ...
g = ... (... (... g_unrolled ... f_unrolled ... h ...) ... (...
g_unrolled ... f_unrolled ... h ...) ... h ...) ... (... (...
g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ...
f_unrolled ... h ...) ... h ...) ... h ...

By ensuring that f and g are tagged INLINE we get the existing INLINE
restrictions automatically in later Core passes.

I think that this example transformation matches your spec - am I right?

Cheers,
Max

Re: Loop unrolling + fusion ?

Max Bolingbroke