optimizing StgPtr allocate (Capability *cap, W_ n)

Hello, i'm looking a the https://github.com/ghc/ghc/blob/23bb90460d7c963ee617d250fa0a33c6ac7bbc53/rts... if i correctly understand, it's speed-critical routine? i think that it may be improved in this way: StgPtr allocate (Capability *cap, W_ n) { bdescr *bd; StgPtr p; TICK_ALLOC_HEAP_NOCTR(WDS(n)); CCS_ALLOC(cap->r.rCCCS,n); /// here starts new improved code: bd = cap->r.rCurrentAlloc; if (bd == NULL || bd->free + n > bd->end) { if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) { .... } if (bd->free + n <= bd->start + BLOCK_SIZE_W) bd->end = min (bd->start + BLOCK_SIZE_W, bd->free + LARGE_OBJECT_THRESHOLD) goto usual_alloc; } .... } /// and here it stops usual_alloc: p = bd->free; bd->free += n; IF_DEBUG(sanity, ASSERT(*((StgWord8*)p) == 0xaa)); return p; } i think it's obvious - we consolidate two if's on the crirical path into the single one plus avoid one ADD by keeping highly-useful bd->end pointer further improvements may include removing bd==NULL check by initializing bd->free=bd->end=NULL and moving entire "if" body into separate slow_allocate() procedure marked "noinline" with allocate() probably marked as forceinline: StgPtr allocate (Capability *cap, W_ n) { bdescr *bd; StgPtr p; TICK_ALLOC_HEAP_NOCTR(WDS(n)); CCS_ALLOC(cap->r.rCCCS,n); bd = cap->r.rCurrentAlloc; if (bd->free + n > bd->end) return slow_allocate(cap,n); p = bd->free; bd->free += n; IF_DEBUG(sanity, ASSERT(*((StgWord8*)p) == 0xaa)); return p; } this change will greatly simplify optimizer's work. according to my experience current C++ compilers are weak on optimizing large functions with complex execution paths and such transformations really improve the generated code -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi Bulat!
Is this exactly the same email you posted to
glasgow-haskell-users@ previously (I'm asking to know
if I need to reread)? It was already forwarded here by none
other but SPJ. :)
Welcome,
Mikolaj
On Wed, Oct 15, 2014 at 12:01 AM, Bulat Ziganshin
Hello,
i'm looking a the https://github.com/ghc/ghc/blob/23bb90460d7c963ee617d250fa0a33c6ac7bbc53/rts...
if i correctly understand, it's speed-critical routine?
i think that it may be improved in this way:
StgPtr allocate (Capability *cap, W_ n) { bdescr *bd; StgPtr p;
TICK_ALLOC_HEAP_NOCTR(WDS(n)); CCS_ALLOC(cap->r.rCCCS,n);
/// here starts new improved code:
bd = cap->r.rCurrentAlloc; if (bd == NULL || bd->free + n > bd->end) { if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) { .... } if (bd->free + n <= bd->start + BLOCK_SIZE_W) bd->end = min (bd->start + BLOCK_SIZE_W, bd->free + LARGE_OBJECT_THRESHOLD) goto usual_alloc; } .... }
/// and here it stops
usual_alloc: p = bd->free; bd->free += n;
IF_DEBUG(sanity, ASSERT(*((StgWord8*)p) == 0xaa)); return p; }
i think it's obvious - we consolidate two if's on the crirical path into the single one plus avoid one ADD by keeping highly-useful bd->end pointer
further improvements may include removing bd==NULL check by initializing bd->free=bd->end=NULL and moving entire "if" body into separate slow_allocate() procedure marked "noinline" with allocate() probably marked as forceinline:
StgPtr allocate (Capability *cap, W_ n) { bdescr *bd; StgPtr p;
TICK_ALLOC_HEAP_NOCTR(WDS(n)); CCS_ALLOC(cap->r.rCCCS,n);
bd = cap->r.rCurrentAlloc; if (bd->free + n > bd->end) return slow_allocate(cap,n);
p = bd->free; bd->free += n;
IF_DEBUG(sanity, ASSERT(*((StgWord8*)p) == 0xaa)); return p; }
this change will greatly simplify optimizer's work. according to my experience current C++ compilers are weak on optimizing large functions with complex execution paths and such transformations really improve the generated code
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hello Mikolaj, Wednesday, October 15, 2014, 2:07:35 AM, you wrote: i'm sorry, it's the same
Hi Bulat!
Is this exactly the same email you posted to glasgow-haskell-users@ previously (I'm asking to know if I need to reread)? It was already forwarded here by none other but SPJ. :)
Welcome, Mikolaj
On Wed, Oct 15, 2014 at 12:01 AM, Bulat Ziganshin
wrote: Hello,
i'm looking a the https://github.com/ghc/ghc/blob/23bb90460d7c963ee617d250fa0a33c6ac7bbc53/rts...
if i correctly understand, it's speed-critical routine?
i think that it may be improved in this way:
StgPtr allocate (Capability *cap, W_ n) { bdescr *bd; StgPtr p;
TICK_ALLOC_HEAP_NOCTR(WDS(n)); CCS_ALLOC(cap->r.rCCCS,n);
/// here starts new improved code:
bd = cap->r.rCurrentAlloc; if (bd == NULL || bd->free + n > bd->end) { if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) { .... } if (bd->free + n <= bd->start + BLOCK_SIZE_W) bd->end = min (bd->start + BLOCK_SIZE_W, bd->free + LARGE_OBJECT_THRESHOLD) goto usual_alloc; } .... }
/// and here it stops
usual_alloc: p = bd->free; bd->free += n;
IF_DEBUG(sanity, ASSERT(*((StgWord8*)p) == 0xaa)); return p; }
i think it's obvious - we consolidate two if's on the crirical path into the single one plus avoid one ADD by keeping highly-useful bd->end pointer
further improvements may include removing bd==NULL check by initializing bd->free=bd->end=NULL and moving entire "if" body into separate slow_allocate() procedure marked "noinline" with allocate() probably marked as forceinline:
StgPtr allocate (Capability *cap, W_ n) { bdescr *bd; StgPtr p;
TICK_ALLOC_HEAP_NOCTR(WDS(n)); CCS_ALLOC(cap->r.rCCCS,n);
bd = cap->r.rCurrentAlloc; if (bd->free + n > bd->end) return slow_allocate(cap,n);
p = bd->free; bd->free += n;
IF_DEBUG(sanity, ASSERT(*((StgWord8*)p) == 0xaa)); return p; }
this change will greatly simplify optimizer's work. according to my experience current C++ compilers are weak on optimizing large functions with complex execution paths and such transformations really improve the generated code
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (2)
-
Bulat Ziganshin
-
Mikolaj Konarski