
On 20 October 2010 04:05, wren ng thornton
On 10/19/10 5:47 AM, Ryan Newton wrote:
That sounds good to me. In any case the parallel map/fold operations by themselves shouldn't compromise the abstraction.
Perhaps an eventual solution would be to start including parallel maps/folds right inside the standard libraries. I haven't began testing this yet but it would seem that all the balanced tree implementations are good candidates for a little `par` treatment. Has this been tried somewhere already? Alas, having par/nonpar versions of library functions compounds the already present monadic/non-monadic schism...
Another problem is that the desirable level of parallelization isn't fixed. For example, let's consider a function f being mapped over a collection C.
With non-parallel map this has cost O(0*k) + O(C)*O(f) where k is the cost of spawning/reaping threads, O(C) is the size of the collection, and O(f) is the cost of running f once.
Now, consider mapPar2 which uses two threads. The cost of this is O(1*k) + 2 `par` O(C)/2*O(f), where `par` is a kind of multiplication based on the level of parallelism actually achieved. With perfect paralellism x`par`y = y; with no parallelism x`par`y = x*y.
You probably mean x + y.
We can generalize this to O((n-1)*k) + n `par` O(C)/n*O(f) for n threads. But the problem is, depending on how big O(f) is relative to O(k), there's going to be a different n which gives the optimal tradeoff. If O(f) is small enough, then the overhead of sparking threads is wasted; if O(f) is middling, then we'd want a few threads but not "too many"; if O(f) is huge, then we want n = O(C) since the overhead disappears in the asymptotics. Thus, what we'd need to offer in the API is a set of evaluators in order to alter the level of parallelization.
Actually, it doesn't work that way anymore. Threads are not (or only rarely) created to run sparks. Instead, idle threads look into the spark pool and grab and execute some sparks. Nowadays, the main overhead you get is from stealing sparks from other CPUs which will incur a lot of cache misses. If the other CPU is busy it may still be worth it, but it's quite hard to predict. Sparks are also run in chunks which makes it less expensive to have small sparks. -- Push the envelope. Watch it bend.