
Donald Bruce Stewart wrote:
Hmm, any change with -O2? Is the optimiser changing the code such that the scheduler doesn't get to switch threads as often? If you change the thread scheduler switching rate does that change anything?
The behavior only appears when -O or anything greater than -O is
applied. It does appear to be that thread switching isn't happening as
often as I wanted it for the Prime number example.
On 7/10/07, Simon Marlow
(a) the child threads aren't doing any work, just accumulating a large thunk which gets evaluated by the main thread sequentially.
this is unlikely, as it's using IO monad, which forces evaluation for things like array updates.
(b) you have a sequential dependency somewhere
also unlikely, because without -O it'd use two OS threads.
(c) tight loops that don't allocate don't give the scheduler a chance to run and load-balance.
I doubt that (c) is a problem for you: it normally occurs when you try to use par/seq and strategies, and are playing with parallel fibonacci. Here you are using forkIO which definitely allocates, so that shouldn't be a problem.
it's possible that the thread doesn't allocate much after the optimization. In the Prime number example, every thread actually spawns a new thread before doing its own work, and the work it does (function remove') will not spawn new threads. It is very likely that remove' is optimized to a simple loop as it's tail recursive. So each thread has little chance to give other thread a chance to run if it doesn't switch during thread spawning. Compare this to the QSort example, where each thread spawns a new thread to sort half of the array after splitting, and continue to sort the other half in the original thread. This could explain the difference. Indeed, after I insert a yield after "spawnRemover (i + 1)", it now happily crunches number on both CPUs. Thank you both for the suggestions! Regards, Paul L