
On Wed, Aug 22, 2007 at 04:07:22AM +0100, Hugh Perkins wrote:
On 8/21/07, Andrew Coppin
wrote: I highly doubt that automatic threading will happen any time this decade - but I just learned something worth while from reading this email. ;-)
That's an interesting observation. I cant say I dont believe it, but I'm interested to know why (but it could be just a feeling, or an observation in time-to-market lead times?). Are you saying this because multicores arent sufficiently widespread or powerful enough yet (4-cores doesnt really even make up for the overhead of using automatic threading, at least in initial implementations)? or are you saying this because you think the technical implementations are not sufficiently advanced?
Automatic threading is inherently limited by data dependencies. You can't run a function that branches on an argument in parallel with the computation producing that argument. Even with arbitrarily many processors and no communication overhead there is a limit to how much parallelism you can squeeze from any given program. You should read "Feedback Directed Implicit Parallelism" http://research.microsoft.com/~tharris/papers/2007-fdip.pdf and perhaps "Limits to Implicit Parallelism in Functional Applications" http://www.detreville.org/papers/Limits.pdf In short, with zero overhead and an oracle for scheduling you can get a factor of at most 2x to 32x by implicitly parallelizing existing Haskell code. In practice, with execution overhead it's a gain of perhaps 10% to 80% on a 4-core system. The experiments in the first paper are based on a fairly sophisticated implementation that reduces overhead by using profiling results at compile time to decide which thunks might be worth evaluating in parallel. For a fixed benchmark there's probably not much lost by using canned profiling results instead of adapting at runtime, and in any case the hard bounds from data dependencies still apply. You can do a lot better if you expect people to rewrite code, but "automatic threading" suggests something completely effortless. I think you can get much better results if you work on the programming style in connection with a better runtime. You can think of data parallel Haskell as a new programming style with more implicit parallelims, and the runtime support to exploit it.
I kindof think automatic threading is like 3d graphics: as soon as the hardware became sufficiently powerful, 3d graphics became trivial. Enough money was thrown at the problem in a very short time by a few powerful companies that it was a non-issue.
If you have cores to waste, you might try rewrites like f x => case x of C1 a1 a2 -> f (C1 a1 a2) C2 b -> f (C2 b) C3 -> f C3 and then speculatively execute several of the case branches. If you don't throw away too much type information you should even be able to do it at runtime. Brandon