
Dear all, I was wondering what is the current status of the ghc RTS with respect to threading. Is it true that the allocator and deallocator (garbage collector) are still single-threaded? I made this example: import Control.Concurrent import Control.Concurrent.QSemN primes1 = sieve [ 2 .. ] primes2 = 2 : sieve [ 3, 5 .. ] sieve (x : ys) = x : sieve ( filter ( \ y -> 0 < y `mod` x ) ys ) main = do s <- newQSemN 0 forkIO $ do print $ sum $ take 10000 primes1 ; signalQSemN s 1 forkIO $ do print $ sum $ take 10000 primes2 ; signalQSemN s 1 waitQSemN s 2 Between the two computations, I see absolutely no data dependencies. So the naive expectation is that using two threads (in the RTS, on a multicore machine) gives half the execution time. I compiled with ghc-6.10.2 -o Par -O2 Par.hs -threaded and when I run with +RTS -N2 (instead of N1), I get my CPU load at 140 % (instead of 100) but the total run time (wall clock) stays nearly constant (i.e. CPU time goes up). For reference, I'm pretty sure a similar thing happens with the current mono implementation (they use Boehm GC and that does not know about C#/CIL threads ?), and this is preventing multi-core speedups in "straightforward" programs (that have no data dependencies, but lots of object creation, e.g. via (hidden) Enumerators and such.) Well, then, if the two Haskell threads are (nearly) completely independent like the above, it would be better to compile and run two separate executables and have them communicate via the OS (pipe or port). But that shouldn't be! (the OS being better than Haskell) Is there was a way of partitioning the memory (managed by the ghc RTS) in totally independent parts that each have their stand-alone memory management. Of course then all communication had to go via some Control.Concurrent.Chan, but that should be fine, if there is little of them. Well, just some thought. This idea can't be new? Tell me why it couldn't possibly work ... J.W.