Re: [Haskell-cafe] GHC, odd concurrency space leak

17 Apr 2010

      On Sat, Apr 17, 2010 at 12:00 AM, Jason Dagit  wrote:
...
Myself and others posted "simpler" programs that had similar bad behavior,
including the space leak (depending on optimizations flags).  I realize it's
tedious to retest all those versions, but do you think you could check with
one of the other versions that doesn't need mtl?
You got me curious enough that I decided to attack it systematically.
Here is a test-run script:

\begin{code}
#!/bin/bash

GHC68=/usr/local/stow/ghc-6.8.3/bin/ghc
GHC610=/usr/local/stow/ghc-6.10.4/bin/ghc
GHC612=ghc
GHC6HEAD=/usr/local/stow/ghc-6.13-20100416/bin/ghc

run_round () {
    EXE=$(basename ${1} .hs)
    echo ----------------------------------------------
    echo GHC68
    ${GHC68} --make $2 $1
    ./${EXE} +RTS -tstderr $3
    echo ----------------------------------------------
    echo GHC610
    ${GHC610} --make $2 $1
    ./${EXE} +RTS -tstderr $3
    echo ----------------------------------------------
    echo GHC612
    ${GHC612} --make $2 $1
    ./${EXE} +RTS -tstderr $3
    echo ----------------------------------------------
    echo GHC HEAD
    ${GHC6HEAD} --make -rtsopts $2 $1
    ./${EXE} +RTS -tstderr $3
}

run_round $1 $2 $3
\end{code}

With this script down, we can run your "Good" version:

jlouis@illithid:~$ sh runner.sh JD-Good.hs
----------------------------------------------
GHC68
[1 of 1] Compiling Main             ( JD-Good.hs, JD-Good.o )
Linking JD-Good ...
./JD-Good +RTS -tstderr
Main thread starting
Delaying
<>
----------------------------------------------
GHC610
[1 of 1] Compiling Main             ( JD-Good.hs, JD-Good.o )
Linking JD-Good ...
./JD-Good +RTS -tstderr
Main thread starting
Delaying
<>
----------------------------------------------
GHC612
[1 of 1] Compiling Main             ( JD-Good.hs, JD-Good.o )
Linking JD-Good ...
./JD-Good +RTS -tstderr
Main thread starting
Delaying
<>
----------------------------------------------
GHC HEAD
[1 of 1] Compiling Main             ( JD-Good.hs, JD-Good.o )
Linking JD-Good ...
./JD-Good +RTS -tstderr
Main thread starting
Delaying
<>

Your "Bad" one doesn't terminate consistently on all versions of GHC.
Adding -threaded does not help. Neil Browns version is also consistent
over all versions of GHC and doesn't terminate. It does not matter if
I add -threaded.
...
Well, I think Bulat correctly characterized the non-termination aspect.  I
didn't think the cooperative aspect of threading applied with the threaded
RTS, so I'm not 100% sure I believe his characterization, but otherwise it
seems like a reasonable explanation.
It is certainly a valid explanation, and the most plausible at the
moment I think.
...
The space leakiness is a different
issue and likely worth a bug report in its own right.  Do you think you
could try checking for the speak leaking using the compacting garbage
collector?  I think that one is enabled with +RTS -c -RTS.
Oh, that gives some interesting progress:

Here is the run without -c:

jlouis@illithid:~$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.12.1
jlouis@illithid:~$ ghc --make -threaded Post.hs
jlouis@illithid:~$ ./Post +RTS -s
./Post +RTS -s
Main thread starting
Delaying
     840,429,800 bytes allocated in the heap
     336,183,280 bytes copied during GC
      86,294,808 bytes maximum residency (8 sample(s))
       2,648,600 bytes maximum slop
             171 MB total memory in use (3 MB lost due to fragmentation)

  Generation 0:  1596 collections,     0 parallel,  0.35s,  0.33s elapsed
  Generation 1:     8 collections,     0 parallel,  0.27s,  0.35s elapsed

  Parallel GC work balance: nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  0.32s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.37s    (  0.32s)       0.62s    (  0.68s)
  Task  2 (worker) :    0.00s    (  0.32s)       0.00s    (  0.00s)
  Task  3 (worker) :    0.00s    (  0.32s)       0.00s    (  0.00s)

  SPARKS: 0 (0 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.28s  (  0.32s elapsed)
  GC    time    0.62s  (  0.68s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.90s  (  1.00s elapsed)

  %GC time      68.9%  (67.6% elapsed)

  Alloc rate    3,001,331,338 bytes per MUT second

  Productivity  31.1% of total user, 27.9% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_large_objects: 0

And here with the compacting GC:

jlouis@illithid:~$ ./Post +RTS -s -c
./Post +RTS -s -c
Main thread starting
Delaying
      12,642,360 bytes allocated in the heap
       2,522,160 bytes copied during GC
       2,522,584 bytes maximum residency (3 sample(s))
          59,232 bytes maximum slop
               4 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    22 collections,     0 parallel,  0.02s,  0.01s elapsed
  Generation 1:     3 collections,     0 parallel,  5.08s,  5.09s elapsed

  Parallel GC work balance: nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  0.01s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    (  0.01s)       5.10s    (  5.10s)
  Task  2 (worker) :    0.00s    (  0.01s)       0.00s    (  0.00s)
  Task  3 (worker) :    0.00s    (  0.01s)       0.00s    (  0.00s)

  SPARKS: 0 (0 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.01s  (  0.01s elapsed)
  GC    time    5.10s  (  5.10s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    5.10s  (  5.11s elapsed)

  %GC time      99.9%  (99.8% elapsed)

  Alloc rate    3,159,800,049 bytes per MUT second

  Productivity   0.0% of total user, 0.0% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_large_objects: 0

So it looks like it eliminates the space leak, but note how the
mutator doesn't get to do any work since we are using up all the time
in the GC. We only get to run 22 Gen0 collections and 3 Gen1
collections. In other words, I don't think it does anything to help
with the leak. According to heap profiling, two things take memory:
PAPs (which are partial applications to the RTS, that is functions
which are not yet fully applied), and a closure. It would make sense
that it is a PAP when one looks at the core. State monads are s -> (s,
a) and StateT with IO as the underlying monad gets translated into s
-> ioS -> (ioS, (s, a)), so I am not too confused about the PAP
appearing. Thinking more about this might reveal why the PAP appears
however.

Also, if you need me to run any kind of test against other RTS options
or programs, I'll be happy to do it. Just bump me :)

-- 
J.

Re: [Haskell-cafe] GHC, odd concurrency space leak

Jesper Louis Andersen