
On Sat, Apr 17, 2010 at 12:00 AM, Jason Dagit
Myself and others posted "simpler" programs that had similar bad behavior, including the space leak (depending on optimizations flags). I realize it's tedious to retest all those versions, but do you think you could check with one of the other versions that doesn't need mtl?
You got me curious enough that I decided to attack it systematically.
Here is a test-run script:
\begin{code}
#!/bin/bash
GHC68=/usr/local/stow/ghc-6.8.3/bin/ghc
GHC610=/usr/local/stow/ghc-6.10.4/bin/ghc
GHC612=ghc
GHC6HEAD=/usr/local/stow/ghc-6.13-20100416/bin/ghc
run_round () {
EXE=$(basename ${1} .hs)
echo ----------------------------------------------
echo GHC68
${GHC68} --make $2 $1
./${EXE} +RTS -tstderr $3
echo ----------------------------------------------
echo GHC610
${GHC610} --make $2 $1
./${EXE} +RTS -tstderr $3
echo ----------------------------------------------
echo GHC612
${GHC612} --make $2 $1
./${EXE} +RTS -tstderr $3
echo ----------------------------------------------
echo GHC HEAD
${GHC6HEAD} --make -rtsopts $2 $1
./${EXE} +RTS -tstderr $3
}
run_round $1 $2 $3
\end{code}
With this script down, we can run your "Good" version:
jlouis@illithid:~$ sh runner.sh JD-Good.hs
----------------------------------------------
GHC68
[1 of 1] Compiling Main ( JD-Good.hs, JD-Good.o )
Linking JD-Good ...
./JD-Good +RTS -tstderr
Main thread starting
Delaying
<
Well, I think Bulat correctly characterized the non-termination aspect. I didn't think the cooperative aspect of threading applied with the threaded RTS, so I'm not 100% sure I believe his characterization, but otherwise it seems like a reasonable explanation.
It is certainly a valid explanation, and the most plausible at the moment I think.
The space leakiness is a different issue and likely worth a bug report in its own right. Do you think you could try checking for the speak leaking using the compacting garbage collector? I think that one is enabled with +RTS -c -RTS.
Oh, that gives some interesting progress: Here is the run without -c: jlouis@illithid:~$ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.12.1 jlouis@illithid:~$ ghc --make -threaded Post.hs jlouis@illithid:~$ ./Post +RTS -s ./Post +RTS -s Main thread starting Delaying 840,429,800 bytes allocated in the heap 336,183,280 bytes copied during GC 86,294,808 bytes maximum residency (8 sample(s)) 2,648,600 bytes maximum slop 171 MB total memory in use (3 MB lost due to fragmentation) Generation 0: 1596 collections, 0 parallel, 0.35s, 0.33s elapsed Generation 1: 8 collections, 0 parallel, 0.27s, 0.35s elapsed Parallel GC work balance: nan (0 / 0, ideal 1) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 0.00s ( 0.32s) 0.00s ( 0.00s) Task 1 (worker) : 0.37s ( 0.32s) 0.62s ( 0.68s) Task 2 (worker) : 0.00s ( 0.32s) 0.00s ( 0.00s) Task 3 (worker) : 0.00s ( 0.32s) 0.00s ( 0.00s) SPARKS: 0 (0 converted, 0 pruned) INIT time 0.00s ( 0.00s elapsed) MUT time 0.28s ( 0.32s elapsed) GC time 0.62s ( 0.68s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.90s ( 1.00s elapsed) %GC time 68.9% (67.6% elapsed) Alloc rate 3,001,331,338 bytes per MUT second Productivity 31.1% of total user, 27.9% of total elapsed gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_large_objects: 0 And here with the compacting GC: jlouis@illithid:~$ ./Post +RTS -s -c ./Post +RTS -s -c Main thread starting Delaying 12,642,360 bytes allocated in the heap 2,522,160 bytes copied during GC 2,522,584 bytes maximum residency (3 sample(s)) 59,232 bytes maximum slop 4 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 22 collections, 0 parallel, 0.02s, 0.01s elapsed Generation 1: 3 collections, 0 parallel, 5.08s, 5.09s elapsed Parallel GC work balance: nan (0 / 0, ideal 1) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 0.00s ( 0.01s) 0.00s ( 0.00s) Task 1 (worker) : 0.00s ( 0.01s) 5.10s ( 5.10s) Task 2 (worker) : 0.00s ( 0.01s) 0.00s ( 0.00s) Task 3 (worker) : 0.00s ( 0.01s) 0.00s ( 0.00s) SPARKS: 0 (0 converted, 0 pruned) INIT time 0.00s ( 0.00s elapsed) MUT time 0.01s ( 0.01s elapsed) GC time 5.10s ( 5.10s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.10s ( 5.11s elapsed) %GC time 99.9% (99.8% elapsed) Alloc rate 3,159,800,049 bytes per MUT second Productivity 0.0% of total user, 0.0% of total elapsed gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_large_objects: 0 So it looks like it eliminates the space leak, but note how the mutator doesn't get to do any work since we are using up all the time in the GC. We only get to run 22 Gen0 collections and 3 Gen1 collections. In other words, I don't think it does anything to help with the leak. According to heap profiling, two things take memory: PAPs (which are partial applications to the RTS, that is functions which are not yet fully applied), and a closure. It would make sense that it is a PAP when one looks at the core. State monads are s -> (s, a) and StateT with IO as the underlying monad gets translated into s -> ioS -> (ioS, (s, a)), so I am not too confused about the PAP appearing. Thinking more about this might reveal why the PAP appears however. Also, if you need me to run any kind of test against other RTS options or programs, I'll be happy to do it. Just bump me :) -- J.