Re: [Haskell-cafe] Parallel weirdness [update]

19 Apr 2008

      Andrew Coppin wrote:
...
The results of this little benchmark utterly defy comprehension. Allow 
me to enumerate:
Weird thing #1: The first time you sort the data, it takes a few 
seconds. The other 7 times, it takes a split second - roughly 100x 
faster. Wuh?
The test data was a CAF. I changed it to a regular function, and this 
behaviour vanished. Now *all* runs of a given test take approximately 
the same amount of time (to within a few ms anyway).
...
Weird thing #2: The parallel version runs *faster* than the sequential 
one in all cases - even with SMP disabled! (We're only talking a few 
percent faster, but still.)
Now the parallel version is only faster with 1 worker thread. With more 
(even just 2) it becomes *slower* than the sequential version. 
Interestingly, it *does* now seem to be using more than 50% CPU. So it 
seems to actually be doing more work, just less efficiently.

My first guess would be that it's creating data in a different order and 
thus stressing the GC more or something. Or maybe it's just that the 
algorithm sparks millions of really *tiny* items, which waste a lot of 
time. (It's a very simple implementation, so I was somewhat expecting 
this.) I'll try tuning further...
...
Weird thing #3: Adding the "-threaded" compiler option makes 
*everything* run a few percent faster. Even with only 1 OS thread.
Still true.
...
Weird thing #4: Adding "-N2" makes *everything* slow down a few 
percent. In particular, Task Manager shows only one CPU core in use.
Adding -N2 does still slow everything down, but not by very much. 
(Except the truely parallel stuff - that shows quite a big slowdown.) 
Task Manager does now show about 60% CPU usage instead of 50%. (I have 
exactly 2 physical cores.)