Re: FFI calls: is it possible to allocate a small memory block on a stack?

19 Apr 2010

      On 18/04/2010 10:28, Denys Rtveliashvili wrote:
...
...
While alloca is not as cheap as, say, C's alloca, you should find that
it is much quicker than C's malloc.  I'm sure there's room for
optimisation if it's critical for you.  There may well be low-hanging
fruit: take a look at the Core for alloca.
Thank you, Simon.
Indeed, there is a low-hanging fruit.
"alloca"'s type is "Storable a => (Ptr a -> IO b) -> IO b" and it is not
inlined even though the function is small. And calls to functions of
such signature are expensive (I suppose that's because of look-up into
typeclass dictionary). However, when I added an "INLINE" pragma for the
function into Foreign.Marshal.Alloc the time of execution dropped from
40 to 20 nanoseconds. I guess the same effect will take place if other
similar functions get marked with "INLINE".
Is there a reason why we do not want small FFI-related functions with
typeclass arguments be marked with "INLINE" pragma and gain a
performance improvement?
The only reason that comes to my mind is the size of code, but actually
the resulting code looks very small and neat.
Adding an INLINE pragma is the right thing for alloca and similar functions.

alloca is a small overloaded wrapper around allocaBytesAligned, and 
without the INLINE pragma the body of allocaBytesAligned gets inlined 
into alloca itself, making it too big to be inlined at the call site 
(you can work around it with e.g. -funfolding-use-threshold=100).  This 
is really a case of manual worker/wrapper: we want to tell GHC that 
alloca is a wrapper, and the way to do that is with INLINE.  Ideally GHC 
would manage this itself - there's a lot of scope for doing some general 
code splitting, I don't think anyone has explored that yet.

Cheers,
	Simon

Re: FFI calls: is it possible to allocate a small memory block on a stack?

Simon Marlow