Re: [GHC] #8900: Strictness analysis regression

15 Mar 2014

      #8900: Strictness analysis regression
--------------------------------------------+------------------------------
        Reporter:  tibbe                    |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.8.1-rc2
      Resolution:                           |         Keywords:
Operating System:  MacOS X                  |     Architecture:  x86_64
 Type of failure:  Runtime performance bug  |  (amd64)
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------
Description changed by tibbe:

Old description:
...
Edit: There were two issues discussed here. One is solved. I left the
ticket open for the strictness analysis regression part.
I ran a simple benchmark that exercises [https://github.com/tibbe
/unordered-containers/blob/master/Data/HashMap/Base.hs#L303
Data.HashMap.Lazy.insert]. It's 16% slower using HEAD compared to using
7.6.3. The generated Core is a bit different and the generated Cmm is
quite a bit different.
'''Steps to reproduce'''
1. Download the attached `HashMapInsert.hs` benchmark.
2. Install unordered-containers with both 7.6.3 and HEAD:
{{{
$ cabal install -w ghc-7.6.3 unordered-containers-0.2.3.3
$ cabal install -w inplace/bin/ghc-stage2 unordered-containers-0.2.3.3
}}}
3. Compile the benchmark with both compilers:
{{{
$ ghc-7.6.3 -O2 HashMapInsert.hs
$ mv HashMapInsert HashMapInsertOld
$ inplace/bin/ghc-stage2 -O2 HashMapInsert.hs
$ mv HashMapInsert HashMapInsertNew
}}}
'''Results (best of 3 runs)'''
7.6.3
{{{
$ ./HashMapInsertOld +RTS -s
   1,191,223,528 bytes allocated in the heap
     141,978,520 bytes copied during GC
      37,811,840 bytes maximum residency (8 sample(s))
      22,378,432 bytes maximum slop
              99 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed)  Avg pause  Max
pause
  Gen  0      2277 colls,     0 par    0.06s    0.06s     0.0000s
0.0002s
  Gen  1         8 colls,     0 par    0.07s    0.10s     0.0127s
0.0479s
INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.24s  (  0.24s elapsed)
  GC      time    0.13s  (  0.17s elapsed)
  EXIT    time    0.00s  (  0.01s elapsed)
  Total   time    0.37s  (  0.41s elapsed)
%GC     time      34.8%  (40.3% elapsed)
Alloc rate    4,923,204,681 bytes per MUT second
Productivity  65.2% of total user, 59.0% of total elapsed
}}}
HEAD:
{{{
$ ./HashMapInsertNew +RTS -s
   1,191,223,128 bytes allocated in the heap
     231,158,688 bytes copied during GC
      55,533,064 bytes maximum residency (13 sample(s))
      22,378,488 bytes maximum slop
             144 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed)  Avg pause  Max
pause
  Gen  0      2268 colls,     0 par    0.06s    0.07s     0.0000s
0.0003s
  Gen  1        13 colls,     0 par    0.12s    0.16s     0.0127s
0.0468s
INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.25s  (  0.25s elapsed)
  GC      time    0.18s  (  0.23s elapsed)
  EXIT    time    0.00s  (  0.01s elapsed)
  Total   time    0.43s  (  0.49s elapsed)
%GC     time      41.6%  (47.5% elapsed)
Alloc rate    4,738,791,249 bytes per MUT second
Productivity  58.3% of total user, 51.9% of total elapsed
}}}
(Note that this is without the patches in #8885, so they're not the
cause.)
An interesting difference is that we spend more time in GC in HEAD. I
don't know if that's related.
New description:

 Edit: There were two issues discussed here. One is solved. I left the
 ticket open for the strictness analysis regression part. Analysis of
 strictness regression starts in comment 7 below.

 I ran a simple benchmark that exercises [https://github.com/tibbe
 /unordered-containers/blob/master/Data/HashMap/Base.hs#L303
 Data.HashMap.Lazy.insert]. It's 16% slower using HEAD compared to using
 7.6.3. The generated Core is a bit different and the generated Cmm is
 quite a bit different.

 '''Steps to reproduce'''

 1. Download the attached `HashMapInsert.hs` benchmark.
 2. Install unordered-containers with both 7.6.3 and HEAD:

 {{{
 $ cabal install -w ghc-7.6.3 unordered-containers-0.2.3.3
 $ cabal install -w inplace/bin/ghc-stage2 unordered-containers-0.2.3.3
 }}}

 3. Compile the benchmark with both compilers:

 {{{
 $ ghc-7.6.3 -O2 HashMapInsert.hs
 $ mv HashMapInsert HashMapInsertOld
 $ inplace/bin/ghc-stage2 -O2 HashMapInsert.hs
 $ mv HashMapInsert HashMapInsertNew
 }}}

 '''Results (best of 3 runs)'''

 7.6.3

 {{{
 $ ./HashMapInsertOld +RTS -s
    1,191,223,528 bytes allocated in the heap
      141,978,520 bytes copied during GC
       37,811,840 bytes maximum residency (8 sample(s))
       22,378,432 bytes maximum slop
               99 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0      2277 colls,     0 par    0.06s    0.06s     0.0000s
 0.0002s
   Gen  1         8 colls,     0 par    0.07s    0.10s     0.0127s
 0.0479s

   INIT    time    0.00s  (  0.00s elapsed)
   MUT     time    0.24s  (  0.24s elapsed)
   GC      time    0.13s  (  0.17s elapsed)
   EXIT    time    0.00s  (  0.01s elapsed)
   Total   time    0.37s  (  0.41s elapsed)

   %GC     time      34.8%  (40.3% elapsed)

   Alloc rate    4,923,204,681 bytes per MUT second

   Productivity  65.2% of total user, 59.0% of total elapsed
 }}}

 HEAD:

 {{{
 $ ./HashMapInsertNew +RTS -s
    1,191,223,128 bytes allocated in the heap
      231,158,688 bytes copied during GC
       55,533,064 bytes maximum residency (13 sample(s))
       22,378,488 bytes maximum slop
              144 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0      2268 colls,     0 par    0.06s    0.07s     0.0000s
 0.0003s
   Gen  1        13 colls,     0 par    0.12s    0.16s     0.0127s
 0.0468s

   INIT    time    0.00s  (  0.00s elapsed)
   MUT     time    0.25s  (  0.25s elapsed)
   GC      time    0.18s  (  0.23s elapsed)
   EXIT    time    0.00s  (  0.01s elapsed)
   Total   time    0.43s  (  0.49s elapsed)

   %GC     time      41.6%  (47.5% elapsed)

   Alloc rate    4,738,791,249 bytes per MUT second

   Productivity  58.3% of total user, 51.9% of total elapsed
 }}}

 (Note that this is without the patches in #8885, so they're not the
 cause.)

 An interesting difference is that we spend more time in GC in HEAD. I
 don't know if that's related.

--

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8900#comment:12
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler