
The parallel GC currently doesn't behave well with concurrent programs that
uses multiple capabilities (aka OS threads), and the behaviour you see is
the known symptom of this.. I believe that Simon Marlow has some fixes in
hand that may go into 6.12.2.
Are you saying that you see two different classes of undesirable
performance, one with -qg and one without? How are your threads in your real
program communicating with each other? We've seen problems there when
there's a lot of contention for e.g. IORefs among thousands of threads.
On Mon, Mar 1, 2010 at 7:59 AM, Michael Lesniak
Hello haskell-cafe,
Sorry for this long post, but I can't think of a way to describe and explain the problem in a shorter way.
I've (again) a very strange behaviour with the parallel GC and would be glad if someone could either reproduce (and explain) it or provide a solution. A similar but unrelated problem has been described in [1].
EXAMPLE CODE The following demonstration program, which is a much smaller and single-threaded version of my real problem behaves as my real program. It does some number crunching by calculating pi to a definable precision:
-- File Pi.hs -- you need the numbers package from hackage. module Main where import Data.Number.CReal import System.Environment import GHC.Conc
main = do digits <- (read . head) `fmap` getArgs :: IO Int calcPi digits
calcPi digits = showCReal (fromEnum digits) pi `pseq` return ()
Compile it with
ghc --make -threaded -O2 Pi.hs -o pi
BENCHMARKS On my two-core machine I get the following quite strange and unpredictable results:
* Using one thread:
$ for i in `seq 1 5`;do time pi 5000 +RTS -N1;done
real 0m1.441s user 0m1.390s sys 0m0.020s
real 0m1.449s user 0m1.390s sys 0m0.000s
real 0m1.399s user 0m1.370s sys 0m0.010s
real 0m1.401s user 0m1.380s sys 0m0.000s
real 0m1.404s user 0m1.380s sys 0m0.000s
* Using two threads, hence the parallel GC is used:
for i in `seq 1 5`;do time pi 5000 +RTS -N2;done
real 0m2.540s user 0m2.490s sys 0m0.010s
real 0m1.527s user 0m1.530s sys 0m0.010s
real 0m1.966s user 0m1.900s sys 0m0.010s
real 0m5.670s user 0m5.620s sys 0m0.010s
real 0m2.966s user 0m2.910s sys 0m0.020s
* Using two threads, but disabling the parallel GC:
for i in `seq 1 5`;do time pi 5000 +RTS -N2 -qg;done
real 0m1.383s user 0m1.380s sys 0m0.010s
real 0m1.420s user 0m1.360s sys 0m0.010s
real 0m1.406s user 0m1.360s sys 0m0.010s
real 0m1.421s user 0m1.380s sys 0m0.000s
real 0m1.360s user 0m1.360s sys 0m0.000s
THREADSCOPE I've additionally attached the threadscope profile of a really bad run, started with
$ time pi 5000 +RTS -N2 -ls
real 0m15.594s user 0m15.490s sys 0m0.010s
as file pi.pdf
FURTHER INFORMATION/QUESTION Just disabling the parallel GC leads to very bad performance in my original code, which forks threads with forkIO and does a lot of communications. Hence, using -qg is not a real option for me.
Do I have overlooked some cruical aspect of this problem? If you've read this far, thank you for reading ... this far ;-)
Cheers, Michael
[1] http://osdir.com/ml/haskell-cafe@haskell.org/2010-02/msg00850.html
-- Dipl.-Inf. Michael C. Lesniak University of Kassel Programming Languages / Methodologies Research Group Department of Computer Science and Electrical Engineering
Wilhelmshöher Allee 73 34121 Kassel
Phone: +49-(0)561-804-6269
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe