Re: Performance of small allocations via prim ops

6 Apr 2023

      That sounds like a worthy experiment!

I  guess that would look like having an inline macro’d up path that checks
if it can get the job done that falls back to the general code?

Last I checked, the overhead for this sort of c call was on the order of
10nanoseconds or less which seems like it’d be very unlikely to be a
bottleneck, but do you have any natural or artificial benchmark programs
that would show case this?

For this sortah code, extra branching for that optimization could easily
have a larger performance impact than the known function call on modern
hardware.  (Though take my intuitions about these things with a grain of
salt. )

On Tue, Apr 4, 2023 at 9:50 PM Harendra Kumar 
wrote:
...
I was looking at the RTS code for allocating small objects via prim ops
e.g. newByteArray# . The code looks like:
stg_newByteArrayzh ( W_ n )
{
    MAYBE_GC_N(stg_newByteArrayzh, n);
payload_words = ROUNDUP_BYTES_TO_WDS(n);
    words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
    ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
We are making a foreign call here (ccall). I am wondering how much
overhead a ccall adds? I guess it may have to save and restore registers.
Would it be better to do the fast path case of allocating small objects
from the nursery using cmm code like in stg_gc_noregs?
-harendra
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Performance of small allocations via prim ops

Carter Schonwald