behaviour of {-# NOINLINE #-} in where clauses

13 Aug 2006

      This doesn't have the effect I expected:

loop xs =
  case blah of
    One thing -> ... loop
    The other -> ... realloc ...

  where
    {-# NOINLINE realloc #-}
    realloc = do
      something
      loop ...

My intention here was that the loop would not contain the code for
realloc and that it'd be done as a call at the cmm level. My intention
is to take the slow and rarely taken realloc path out of the code for
the fast path.

It seems the {-# NOINLINE realloc #-} pagma did not have the effect I
intended. Looking at the -ddump-simpl and -ddump-cmm, the code for the
realloc gets expanded in place in a branch of a case statement. In the
cmm code we end up with just what I didn't want:

loop_info:
if (offset != 4096) goto later;
...
... lots of realloc code taking up space
... in the instruction / trace cache
...
later:
.. do the fast bits, read a byte, write a byte
jump loop_info;

Not only does the slow path take up space but it's in the location
favoured by the hardware's static branch prediction.

Reversing the test doesn't help because either way ghc turns it into:

case thing of
  _DEFAULT ->
  4096 ->

and from that generates CMM:

if (thing != 4096) goto much_later;
...
much_later:
...

The reason I was looking at this is because I've been trying to figure
out why our lazy byte string fusion primitives are much slower than the
strict versions. It's improving though, it's now only half the speed
rather than a tenth of the speed. :-)

The ByteString.Lazy code is an interesting mixture of strict and lazy.
We must strictly read/write the chunks but lazily generate/consume the
list of chunks.

I just discovered that I should have been reading STG all along rather
than core from the simplifier or CMM. STG takes out all the type
annotations which tend to make things quite verbose. Mind you, seeing
the types can be handy too to see if/how things are unboxed.

Even so, I kind of wish there were a stage between STG and CMM that
showed the imperative model of STG with linear layout, control flow and
notes to indicate thunk/closure allocations. I expect most of my problem
is that I do not understand the STG evaluation model sufficiently well
to see how it maps to basic blocks, jumps/calls etc.

Duncan

behaviour of {-# NOINLINE #-} in where clauses

Duncan Coutts