On Fri, Apr 7, 2023 at 7:35 AM Harendra Kumar <harendra.kumar@gmail.com> wrote:

On Fri, 7 Apr 2023 at 02:18, Carter Schonwald <carter.schonwald@gmail.com> wrote:
That sounds like a worthy experiment!

I guess that would look like having an inline macro’d up path that checks if it can get the job done that falls back to the general code?

Last I checked, the overhead for this sort of c call was on the order of 10nanoseconds or less which seems like it’d be very unlikely to be a bottleneck, but do you have any natural or artificial benchmark programs that would show case this?

I converted my example code into a loop and ran it a million times with a 1 byte array size (would be 8 bytes after alignment). So roughly 3 words would be allocated per array, including the header and length. It took 5 ms using the statically known size optimization which inlines the alloc completely, and 10 ms using an unknown size (from program arg) which makes a call to newByteArray# . That turns out to be of the order of 5ns more per allocation. It does not sound like a big deal.

-harendra