
#8900: Strictness analysis regression --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc2 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Description changed by tibbe: Old description:
Edit: There were two issues discussed here. One is solved. I left the ticket open for the strictness analysis regression part.
I ran a simple benchmark that exercises [https://github.com/tibbe /unordered-containers/blob/master/Data/HashMap/Base.hs#L303 Data.HashMap.Lazy.insert]. It's 16% slower using HEAD compared to using 7.6.3. The generated Core is a bit different and the generated Cmm is quite a bit different.
'''Steps to reproduce'''
1. Download the attached `HashMapInsert.hs` benchmark. 2. Install unordered-containers with both 7.6.3 and HEAD:
{{{ $ cabal install -w ghc-7.6.3 unordered-containers-0.2.3.3 $ cabal install -w inplace/bin/ghc-stage2 unordered-containers-0.2.3.3 }}}
3. Compile the benchmark with both compilers:
{{{ $ ghc-7.6.3 -O2 HashMapInsert.hs $ mv HashMapInsert HashMapInsertOld $ inplace/bin/ghc-stage2 -O2 HashMapInsert.hs $ mv HashMapInsert HashMapInsertNew }}}
'''Results (best of 3 runs)'''
7.6.3
{{{ $ ./HashMapInsertOld +RTS -s 1,191,223,528 bytes allocated in the heap 141,978,520 bytes copied during GC 37,811,840 bytes maximum residency (8 sample(s)) 22,378,432 bytes maximum slop 99 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause Gen 0 2277 colls, 0 par 0.06s 0.06s 0.0000s 0.0002s Gen 1 8 colls, 0 par 0.07s 0.10s 0.0127s 0.0479s
INIT time 0.00s ( 0.00s elapsed) MUT time 0.24s ( 0.24s elapsed) GC time 0.13s ( 0.17s elapsed) EXIT time 0.00s ( 0.01s elapsed) Total time 0.37s ( 0.41s elapsed)
%GC time 34.8% (40.3% elapsed)
Alloc rate 4,923,204,681 bytes per MUT second
Productivity 65.2% of total user, 59.0% of total elapsed }}}
HEAD:
{{{ $ ./HashMapInsertNew +RTS -s 1,191,223,128 bytes allocated in the heap 231,158,688 bytes copied during GC 55,533,064 bytes maximum residency (13 sample(s)) 22,378,488 bytes maximum slop 144 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause Gen 0 2268 colls, 0 par 0.06s 0.07s 0.0000s 0.0003s Gen 1 13 colls, 0 par 0.12s 0.16s 0.0127s 0.0468s
INIT time 0.00s ( 0.00s elapsed) MUT time 0.25s ( 0.25s elapsed) GC time 0.18s ( 0.23s elapsed) EXIT time 0.00s ( 0.01s elapsed) Total time 0.43s ( 0.49s elapsed)
%GC time 41.6% (47.5% elapsed)
Alloc rate 4,738,791,249 bytes per MUT second
Productivity 58.3% of total user, 51.9% of total elapsed }}}
(Note that this is without the patches in #8885, so they're not the cause.)
An interesting difference is that we spend more time in GC in HEAD. I don't know if that's related.
New description: Edit: There were two issues discussed here. One is solved. I left the ticket open for the strictness analysis regression part. Analysis of strictness regression starts in comment 7 below. I ran a simple benchmark that exercises [https://github.com/tibbe /unordered-containers/blob/master/Data/HashMap/Base.hs#L303 Data.HashMap.Lazy.insert]. It's 16% slower using HEAD compared to using 7.6.3. The generated Core is a bit different and the generated Cmm is quite a bit different. '''Steps to reproduce''' 1. Download the attached `HashMapInsert.hs` benchmark. 2. Install unordered-containers with both 7.6.3 and HEAD: {{{ $ cabal install -w ghc-7.6.3 unordered-containers-0.2.3.3 $ cabal install -w inplace/bin/ghc-stage2 unordered-containers-0.2.3.3 }}} 3. Compile the benchmark with both compilers: {{{ $ ghc-7.6.3 -O2 HashMapInsert.hs $ mv HashMapInsert HashMapInsertOld $ inplace/bin/ghc-stage2 -O2 HashMapInsert.hs $ mv HashMapInsert HashMapInsertNew }}} '''Results (best of 3 runs)''' 7.6.3 {{{ $ ./HashMapInsertOld +RTS -s 1,191,223,528 bytes allocated in the heap 141,978,520 bytes copied during GC 37,811,840 bytes maximum residency (8 sample(s)) 22,378,432 bytes maximum slop 99 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 2277 colls, 0 par 0.06s 0.06s 0.0000s 0.0002s Gen 1 8 colls, 0 par 0.07s 0.10s 0.0127s 0.0479s INIT time 0.00s ( 0.00s elapsed) MUT time 0.24s ( 0.24s elapsed) GC time 0.13s ( 0.17s elapsed) EXIT time 0.00s ( 0.01s elapsed) Total time 0.37s ( 0.41s elapsed) %GC time 34.8% (40.3% elapsed) Alloc rate 4,923,204,681 bytes per MUT second Productivity 65.2% of total user, 59.0% of total elapsed }}} HEAD: {{{ $ ./HashMapInsertNew +RTS -s 1,191,223,128 bytes allocated in the heap 231,158,688 bytes copied during GC 55,533,064 bytes maximum residency (13 sample(s)) 22,378,488 bytes maximum slop 144 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 2268 colls, 0 par 0.06s 0.07s 0.0000s 0.0003s Gen 1 13 colls, 0 par 0.12s 0.16s 0.0127s 0.0468s INIT time 0.00s ( 0.00s elapsed) MUT time 0.25s ( 0.25s elapsed) GC time 0.18s ( 0.23s elapsed) EXIT time 0.00s ( 0.01s elapsed) Total time 0.43s ( 0.49s elapsed) %GC time 41.6% (47.5% elapsed) Alloc rate 4,738,791,249 bytes per MUT second Productivity 58.3% of total user, 51.9% of total elapsed }}} (Note that this is without the patches in #8885, so they're not the cause.) An interesting difference is that we spend more time in GC in HEAD. I don't know if that's related. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8900#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler