More speed please!

16 Mar 2007

      All,

So here's the Put monad for the binary serialisation stuff:

newtype Put a = Put {
        runPut :: (a -> Buffer -> [B.ByteString])
                     -> Buffer -> [B.ByteString]
    }

data Buffer = Buffer {-# UNPACK #-} !(ForeignPtr Word8)
                     {-# UNPACK #-} !Int                -- offset
                     {-# UNPACK #-} !Int                -- used bytes
                     {-# UNPACK #-} !Int                -- length left

This is all good, and pretty quick. Code like the following gets turned
into great low level code:

foo :: Word8 -> Put ()
foo !n = do
  word8 n
  word8 (n+1)
  word8 (n+17)

It does a straight line sequence of writes into memory (there are rules
that combine the bounds checking of adjacent writes). After that it
calls the continuation with a new buffer.

While almost everything is unboxed, the parameters to the continuation
are not unboxed. So we have to allocate a new Buffer structure and pass
that to the continuation:

let {
      sat_s1F8 =
          NO_CCS PutMonad.Buffer! [ww1_s1Es
                                   ww2_s1Et
                                   ww3_s1Ew
                                   sat_s1F4
                                   sat_s1F6];
    } in
      w_s1DY
          GHC.Base.()
          sat_s1F8;

w_s1DY being the continuation here.

However we know that really the parameters to this continuation could
really be unboxed since we construct all these continuations explicitly
in this module. We could re-write the monad type by manually explicitly
unboxing the Buffer argument:

newtype Put a = Put {
        runPut :: (a -> Addr# -> ForeignPtrContents -> Int# -> Int# -> Int# -> [B.ByteString])
                     -> Addr# -> ForeignPtrContents -> Int# -> Int# -> Int# -> [B.ByteString]
    }

Then we'd get no allocations and no heap checks in the fast path.

Note that we could still get a continuation in from the 'outside' with
the original type and convert it to one with the above type, though
obviously that involves unpacking the arguments. This unpacking is
basically what the wrapper of a worker/wrapper pair does of course.

Obviously this is ugly. We'd much rather write the original and with
just a couple annotations get the above representation. This is what I
would like to write:

newtype Put a = Put {
        runPut :: (a -> {-# UNPACK #-} !Buffer -> [B.ByteString])
                     -> {-# UNPACK #-} !Buffer -> [B.ByteString]
    }

So I'm declaring a type and a data constructor that contains a function
that is strict in one of it's arguments.

I do not wish to distinguish the strictness in the type however, that is
it should be perfectly ok to do this:

foo :: Foo -> Buffer -> [B.ByteString]

Put foo newBuffer :: Put Foo

Suppose that in this case we do not know locally that foo is strict and
can take it's args unboxed then upon applying the Put constructor we
just have to apply a wrapper function that boxes up the args and
supplies them to the wrapped function.

Obviously that'd be a pessimisation, having to re-box args, however we
can expect programmers to only use that UNPACK pragma when they know
that it is going to be a win overall.

So the ! on the arg is a semantic change and the pragma is a
representation change but not a semantic change. The ! means the obvious
thing, that the function is strict in that arg. So either the caller or
calle must ensure the arg is evaluated to WHNF.

Hmm, is the UNPACK actually needed then? Normally when ghc determines
that a func is strict in an arg it make the worker unbox that arg if
possible and that always seems to be a win. Mine you, in that case we
know what the function is going to do with each arg, where as here we do
not so perhaps a separate UNPACK makes sense.

This issues crops up quite a bit with Monads I think. For example GHC
defines it's IO monad to return an unboxed tuple:

newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))

This is fine for low level internal code, but for everyday monads it'd
be nice to be able to write nice code and still get great performance.
This applies to monads particularly since they very often fully
encapsulate the function used as the internal representation, that is
all the sites where the representation function is constructed and taken
apart are in a single module. So we can 'see' that all uses are strict
in some arg. Or if they're not we might want them to be.

So instead of doing an analysis to figure out if we're always using it
strictly (which is something that jhc/grin might be able to do?) we can
just declare the data representation to be strict like we do with
ordinary algebraic data types, that way it propagates strictness to the
usage sites which is much more convenient than going around strictifying
all the usage sites in an attempt to make an analysis notice that it's
safe to do a representation change.

Ok, I'm done. :-)

Duncan

More speed please!

Duncan Coutts