Should we always inline newByteArray#?

Hi all, After some refactoring of the StgCmmPrim, it's now possible to have both an inline and an out-of-line (in PrimOps.cmm) version of the same primop. Very soon (#8876) we'll have both an inline and an out-of-line version of newByteArray#. The inline version is used when the array size is statically known and commons up the allocation with the normal heap check. The reason to have both versions is that we don't want to increase code size to much (by inlining a primop which implementation is large) unless we know that there's a benefit in doing so. However, the newByteArray# implementation is one function call (to allocate) followed by three stores (to the closure header). Perhaps, that's small enough to always inline? It would save one function call for each call to newByteArray#. Anyone have any thoughts on whether always inlining would be a good idea? -- Johan

On 13/03/14 20:39, Johan Tibell wrote:
Hi all,
After some refactoring of the StgCmmPrim, it's now possible to have both an inline and an out-of-line (in PrimOps.cmm) version of the same primop. Very soon (#8876) we'll have both an inline and an out-of-line version of newByteArray#. The inline version is used when the array size is statically known and commons up the allocation with the normal heap check.
The reason to have both versions is that we don't want to increase code size to much (by inlining a primop which implementation is large) unless we know that there's a benefit in doing so. However, the newByteArray# implementation is one function call (to allocate) followed by three stores (to the closure header). Perhaps, that's small enough to always inline? It would save one function call for each call to newByteArray#.
Anyone have any thoughts on whether always inlining would be a good idea?
It's a bad idea for large arrays (>= 3k), because when allocated via allocate() these arrays get a blocked marked with BF_LARGE that doesn't get copied during GC. It might be a good idea for arrays less than this size (including the header). It's a bad idea if the size isn't statically known, though. Cheers, Simon

On Thu, Mar 13, 2014 at 10:24 PM, Simon Marlow
It's a bad idea for large arrays (>= 3k), because when allocated via allocate() these arrays get a blocked marked with BF_LARGE that doesn't get copied during GC.
It might be a good idea for arrays less than this size (including the header). It's a bad idea if the size isn't statically known, though.
Sorry for not being clear. We will only do the inline *allocation* if the size <= 128 bytes. I'm talking about always inlining the *code* (whether it contains a call to allocate or not). In one case we will inline a definition reusing the heap check. In the other case we will inline a definition that calls allocate. -- Johan

On 13/03/14 21:36, Johan Tibell wrote:
On Thu, Mar 13, 2014 at 10:24 PM, Simon Marlow
mailto:marlowsd@gmail.com> wrote: It's a bad idea for large arrays (>= 3k), because when allocated via allocate() these arrays get a blocked marked with BF_LARGE that doesn't get copied during GC.
It might be a good idea for arrays less than this size (including the header). It's a bad idea if the size isn't statically known, though.
Sorry for not being clear. We will only do the inline *allocation* if the size <= 128 bytes. I'm talking about always inlining the *code* (whether it contains a call to allocate or not). In one case we will inline a definition reusing the heap check. In the other case we will inline a definition that calls allocate.
Oh I see, sorry for misunderstanding. In that case it's a straightforward code size / speed tradeoff. When a similar case came up recently (PAP optimisations) I turned it on for -O2 only. Someday I'd really like to have a -Os that would allow us a bit more control over decisions like this. Cheers, Simon

On Thu, Mar 13, 2014 at 10:42 PM, Simon Marlow
Oh I see, sorry for misunderstanding. In that case it's a straightforward code size / speed tradeoff. When a similar case came up recently (PAP optimisations) I turned it on for -O2 only. Someday I'd really like to have a -Os that would allow us a bit more control over decisions like this.
I will do the same and inline with -O2 only.
participants (2)
-
Johan Tibell
-
Simon Marlow