
thomas.dubuisson:
dons:
Simon Marlow sez:
The thread-ring benchmark needs careful scheduling to get a speedup on multiple CPUs. I was only able to get a speedup by explicitly locking half of the ring onto each CPU. You can do this using GHC.Conc.forkOnIO in GHC 6.8.x, and you'll also need +RTS -qm -qw.
Also make sure that you're not using the main thread for any part of the main computation, because the main thread is a bound thread and runs in its own OS thread, so communication between the main thread and any other thread is slow.
I had to see the results for myself :-)
old RTS: 0m54.296s threaded RTS (-N1): 0m56.839s threaded RTS (-N2): 0m52.623s
Wow! 3x the performance for a simple change. Frustrating that there isn't a protable/standard way to express this. Also frustrating that the threaded version doesn't improve on the situation (utilization is back at 50%).
Anyway, that was a fun miro-benchmark to play with.
Did we gain any insights for submitting to the multicore shootout, http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=all (Where I note GHC is currently in second place, though we've not submitted any parallel programs yet). Also CC'd Isaac, Mr. Shootout. Isaac, is the quad core shootout open for business? Should we rally the troops? -- Don