I should have double-checked my work before I sent the last message; I accidentally benchmarked the wrong program. It turns out that the modifications I last described do not improve the scaling of the program to more cores when used with IOArray. And there was a bug: the line "startIx = numixs * threadNum" should have been "startIx = numixs * (threadNum - 1)".
One more observation... I tried a third variation in which the test program still uses a single shared IOArray but each thread writes to different indices in the array. In this case I get good scaling with performance similar to the use of IOUArray. In detail, I made the following two changes to give each thread a disjoint set of indices to write to:bunchOfKeys threadNum = take numElems $ zip (cycle $ indices numThreads threadNum) $ drop threadNum cyclicCharsandindices :: Int -> Int -> [Int]indices numThreads threadNum =let numixs = arraySize `div` numThreadsstartIx = numixs * threadNumallIndices = [0..highestIndex]in take numixs $ drop startIx allIndices--AndreasOn Tue, Aug 23, 2011 at 5:07 PM, Andreas Voellmy <andreas.voellmy@gmail.com> wrote:
Thanks for the suggestions. I tried to add strictness in the following ways:(1) Changing "insertDAT a j c" to "insertDAT a j $! c"(2) Changing "insertDAT a j c" to "deepseq c (insertDAT a j c)"I also used Int instead of Int32 throughout and changed the DAT data type to a newtype definition. These changes improved the performance slightly, but still, the multithreaded runs perform significantly worse than the single-threaded runs, by about the same amount (i.e. 0.5 seconds more for the 2 core run than for the 1 core run).I used ghc 7.0.3 for the performance measurements I gave in my message. I've also tried under 7.2.1, and I get basically the same behavior there.--AndreasOn Tue, Aug 23, 2011 at 4:38 PM, Johan Tibell <johan.tibell@gmail.com> wrote:On Tue, Aug 23, 2011 at 10:04 PM, Andreas Voellmy
<andreas.voellmy@gmail.com> wrote:
> data DAT = DAT (IOArray Int32 Char)Try to make this a newtype instead. The data type adds a level of indirection.
You most likely want (insertDAT a j $! c) to make sure that the
> do let p j c = insertDAT a j c >> lookupDAT a j >>= \v -> v `pseq` return
> ()
element is force, to avoid thunks building up in the array.
> -- Parameters
> arraySize :: Int32
Int might work better than Int32. While they should behave the same on
32-bit machines Int might have a few more rewrite rules that makes it
optimize better.
-- Johan