
#8885: Add inline versions of clone array primops -------------------------------------+------------------------------------ Reporter: tibbe | Owner: simonmar Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by tibbe): There's not much point in comparing the new inline version vs the old, incorrect version. Fixing the old, incorrect version just to get the benchmark numbers doesn't seem worth it, as we'd have to replicate `MAYBE_GC` in `StgCmmPrim`. Instead I compare the new inline version against the new out-of-line version (which calls `allocate`). The inline versions is 69% faster. Here are the `+RTS -s` numbers for the new out-of-line version: {{{ 1,600,041,120 bytes allocated in the heap 57,992 bytes copied during GC 35,992 bytes maximum residency (1 sample(s)) 21,352 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 3173 colls, 0 par 0.01s 0.01s 0.0000s 0.0000s Gen 1 1 colls, 0 par 0.00s 0.00s 0.0002s 0.0002s INIT time 0.00s ( 0.00s elapsed) MUT time 0.25s ( 0.25s elapsed) GC time 0.01s ( 0.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.26s ( 0.26s elapsed) %GC time 2.1% (2.9% elapsed) Alloc rate 6,417,285,798 bytes per MUT second Productivity 97.9% of total user, 95.6% of total elapsed }}} And for the inline version: {{{ 1,600,041,120 bytes allocated in the heap 57,224 bytes copied during GC 35,992 bytes maximum residency (1 sample(s)) 21,352 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 3125 colls, 0 par 0.00s 0.01s 0.0000s 0.0000s Gen 1 1 colls, 0 par 0.00s 0.00s 0.0002s 0.0002s INIT time 0.00s ( 0.00s elapsed) MUT time 0.08s ( 0.08s elapsed) GC time 0.00s ( 0.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.08s ( 0.09s elapsed) %GC time 6.1% (8.2% elapsed) Alloc rate 20,999,017,271 bytes per MUT second Productivity 93.9% of total user, 89.2% of total elapsed }}} You can see that the GC issue has been fixed. I've attached updated versions of my patches that address the `MAYBE_GC` issue, which was also present in my new out-of-line implementation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8885#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler