Re: [Haskell-cafe] Haskell Speed Myth

25 Aug 2008


      jed:
...
On Sun 2008-08-24 11:03, Thomas M. DuBuisson wrote:
...
Yay, the multicore version pays off when the workload is non-trivial.
CPU utilization is still rather low for the -N2 case (70%).  I think the
Haskell threads have an affinity for certain OS threads (and thus a
CPU).  Perhaps it results in a CPU having both tokens of work and the
other having none?
This must be obvious to everyone but the original thread-ring cannot
possibly be faster with multiple OS thread since a thread can only be
running if it has the token, otherwise it is just blocked on the token.
If there are threads executing simultaneously, the token must at least
be written to the shared cache if not to main memory.  With the single
threaded runtime, the token may never leave L1.  The difference between
-threaded -N1 and -nothreaded may be influenced by the effectiveness of
prefetching the next thread (since presumably not all 503 threads can
reside in L1).
Simon Marlow sez:

    The thread-ring benchmark needs careful scheduling to get a speedup
    on multiple CPUs. I was only able to get a speedup by explicitly
    locking half of the ring onto each CPU. You can do this using
    GHC.Conc.forkOnIO in GHC 6.8.x, and you'll also need +RTS -qm -qw.

    Also make sure that you're not using the main thread for any part of
    the main computation, because the main thread is a bound thread and
    runs in its own OS thread, so communication between the main thread
    and any other thread is slow.

Re: [Haskell-cafe] Haskell Speed Myth

Don Stewart