
Thanks for your comments.
Check whether it is GC-bound by using +RTS -sstderr.
Well yes, it does a lot of GC (there's no way for the compiler to optimize away the list of primes) because that was the point of the example: to confirm (or disprove) that GC hurts parallelism (at the moment). INIT time 0.00s ( 0.00s elapsed) MUT time 13.23s ( 7.98s elapsed) GC time 14.12s ( 14.11s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 27.35s ( 22.09s elapsed) %GC time 51.6% (63.9% elapsed)
Try a recent HEAD snapshot if you can, or wait for 6.12.1.
I did with 6.11.20090425 and it coredumps with +RTS -N2 (on x86_64) Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x41001950 (LWP 1007)] 0x000000000044e831 in yieldCapability () Current language: auto; currently asm (gdb) where #0 0x000000000044e831 in yieldCapability () #1 0x000000000042d8d3 in schedule () #2 0x000000000042e485 in workerStart () #3 0x00002b680cdcdfc7 in start_thread () from /lib/libpthread.so.0 #4 0x00002b680d0b25ad in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () but it runs with +RTS -N2 -q1 (I don't know exactly what this does) and the numbers did not change much - down from 22 sec to 21 sec maybe) PS: yes, I confirmed that the OS can run the two "primes" enumerations (as separately compiled executables) in parallel in 6 sec wall time. Best regards, J.W.