Re: More speed please!

1 May 2007

      I'm replying to a rather old thread here, about unboxing in functions. Duncan
had a continuation monad which passed around some data type that would be nice
to unbox. You discussed strictness annotations in function types as a potential
solution. I have a different tack on the problem which seems potentially
useful. I've experimented with doing local defunctionalization on the module.
This is a long mail as I will try to explain in some detail what it is that I
have done. Please be patient.

Normal defunctionalization is about replacing the primitive function type
"a -> b" with an algebraic data type which I'll call "Fun a b". Not all
functions will be eliminated as we will see but the program will be first
order after the transformation. The core of the transformation is that every
lambda in the program gives rise to a new constructor in the Fun data type and
whenever we apply a function we instead call a newly created "apply function"
with the following type "Fun a b -> a -> b". This is basically what JHC does.

Defunctionalization is normally a whole program transformation (which is why
JHC is a whole program compiler). But sometimes it can be done on a per module
basis. This is where *local* defunctionalization comes in. The key to local
defunctionalization is that we often can divide the data type Fun into several
disjoint data types. We can do this whenever there are several different
function spaces that never get mixed up. And sometimes we're even so lucky
that a function space is totally contained in one module. Then we can do
local defunctionalization of that particular function space only and
completely within that module without changing it's interface. This case often
comes up when using the continuation monad and Duncan's code is not an
exception.

So, I've manually done local defunctionalization on Duncan's code. It gives
rise to two types which I've called Fun1 and Fun2. They look like follows
(including the Put monad):

\begin{code}
newtype Put a = Put {
        runPut :: Fun2 a
    }

data Fun1 a where
  Bind     :: (a -> Put b) -> Fun1 b -> Fun1 a
  Then     :: Put b  -> Fun1 b -> Fun1 a
  Run      :: Fun1 ()
  FlushOld :: !(Fun1 ()) -> !Int -> !(ForeignPtr Word8) -> !Int -> !Int
            -> Fun1 ()

data Fun2 a where
  Return :: a -> Fun2 a
  Bind2  :: Put a -> (a -> Put b) -> Fun2 b
  Then2  :: Put a -> Put b -> Fun2 b
  Flush  :: Fun2 ()
  Write  :: !Int -> (Ptr Word8 -> IO ()) -> Fun2 ()
\end{code}
Intuitively every constructor corresponds to a closure. I've chosen the name
for the constructor based on which function the closure appears in.

The respective apply functions for these data types acts as interpreters and
executes the corresponding code for each constructor/closure. Their type look
as follow:

\begin{code}
apply1 :: Fun1 a -> a -> Buffer -> [B.ByteString]
apply2 :: Fun2 a -> Fun1 a -> Buffer -> [B.ByteString]
\end{code}

Now, the cool thing is that once GHC starts optimizing away on these apply
functions they will be unboxed and no Buffer will ever be created or passed
around. Here is the core type for apply1:
\begin{core}
$wapply1_r21p :: forall a_aQu.
		 PutMonad.Fun1 a_aQu
		 -> a_aQu
		 -> GHC.Prim.Addr#
		 -> GHC.ForeignPtr.ForeignPtrContents
		 -> GHC.Prim.Int#
		 -> GHC.Prim.Int#
		 -> GHC.Prim.Int#
		 -> [Data.ByteString.Base.ByteString]
\end{core}
This is exactly what Duncan wanted, right? I declare victory :-)

However, things are not all roses. There are some functions that will
not be unboxed as we hope for with this approach, for instance the function
flushOld (see Duncan's code). To achieve the best possible optimization I
think one would have to perform strictness analysis and the worker-wrapper
transformation twice, once before doing local defunctionalization and then
again on the apply functions generated by the defunctionalization process.
This should give the code that Duncan wants I believe.

I think it should be relatively straightforward to implement local
defunctionalization in GHC but it should not be turned on by default as the
number of modules where it is beneficial is rather few.

The complete defunctionalized version of Duncan's module is attached.

I'm sure there are a lot of things that are somewhat unclear in this message.
Feel free to ask and I'll do my best to clarify.

Cheers,

Josef

Re: More speed please!

Josef Svenningsson