
duncan.coutts:
On Fri, 2009-05-22 at 05:30 -0700, Don Stewart wrote:
Answer recorded at:
I have to complain, this answer doesn't explain anything. This isn't like straight-line performance, there's no reason as far as I can see that inlining should change the operational behaviour of parallel evaluation, unless there's some mistake in the original such as accidentally relying on an unspecified evaluation order.
Now, I tried the example using two versions of ghc and I get different behaviour from what other people are seeing. With the original code, (ie parallelize function in the same module) with ghc-6.10.1 I get no speedup at all from -N2 and with 6.11 I get a very good speedup (though single threaded performance is slightly lower in 6.11)
Original code ghc-6.10.1, -N1 -N2 real 0m9.435s 0m9.328s user 0m9.369s 0m9.249s
ghc-6.11, -N1 -N2 real 0m10.262s 0m6.117s user 0m10.161s 0m11.093s
With the parallelize function moved into another module I get no change whatsoever. Indeed even when I force it *not* to be inlined with {-# NOINLINE parallelize #-} then I still get no change in behaviour (as indeed I expected).
So I view this advice to force inlining with great suspicion (at worst it encourages people not to think and to look at it as magic). That said, why it does not get any speedup with ghc-6.10 is also a mystery to me (there's very little GC going on).
Don: can we change the advice on the wiki please? It currently makes it look like a known and understood issue. If anything we should suggest using a later ghc version.
Please do so. Especially if GHC HEAD *does the right thing*. Then the advice should be first: upgrade to GHC HEAD.