
dons:
Simon Marlow sez:
The thread-ring benchmark needs careful scheduling to get a speedup on multiple CPUs. I was only able to get a speedup by explicitly locking half of the ring onto each CPU. You can do this using GHC.Conc.forkOnIO in GHC 6.8.x, and you'll also need +RTS -qm -qw.
Also make sure that you're not using the main thread for any part of the main computation, because the main thread is a bound thread and runs in its own OS thread, so communication between the main thread and any other thread is slow.
I had to see the results for myself :-) old RTS: 0m54.296s threaded RTS (-N1): 0m56.839s threaded RTS (-N2): 0m52.623s Wow! 3x the performance for a simple change. Frustrating that there isn't a protable/standard way to express this. Also frustrating that the threaded version doesn't improve on the situation (utilization is back at 50%). Anyway, that was a fun miro-benchmark to play with. Tom