Why isn't there a cheaper "split-in-two" operation for Data.Set?

Would there be anything wrong with a Data.Set simply chopping off half its (balanced) tree and returning two approximately balanced partitions -- i.e. Data.Set.split without the pivot argument? Even though it can't quite be constant-time (still need rebalancing) it could still be cheaper than Data.Set.split, right? In the context of Set.hs, I'm thinking along the lines of something like the following: cleave :: Set a -> (Set a, Set a) cleave Tip = (Tip, Tip) cleave (Bin _ x l r) | size l > size r = (l, insertMin x r) | otherwise = (insertMax x l, r) For one thing, this seems like the proposed function would be useful for consuming a set in parallel, similar to Guy Steele's arguments in recent talks (see Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmfulhttp://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf). That is, one splits a tree and uses "par" to process the left and right half in parallel, enabling Alternatively, it would be simple to provide a parallel version of mapMonotonic that does this internally, i.e.: mapMonotonic _ Tip = Tip mapMonotonic f (Bin sz x l r) = Bin sz (f x) (mapMonotonic f l) (mapMonotonic f r) Becomes something like: parMapMonotonic f (Bin sz x l r) = par left $ seq right $ Bin sz (f x) left right where left = mapMonotonic f l right = mapMonotonic f r Cheers, -Ryan

Would there be anything wrong with a Data.Set simply chopping off half its (balanced) tree and returning two approximately balanced partitions -- i.e. Data.Set.split without the pivot argument? Even though it can't quite be constant-time (still need rebalancing) it could still be cheaper than Data.Set.split, right? In the context of Set.hs, I'm thinking along the lines of something like the following: cleave :: Set a -> (Set a, Set a) cleave Tip = (Tip, Tip) cleave (Bin _ x l r) | size l > size r = (l, insertMin x r) | otherwise = (insertMax x l, r) For one thing, this seems like the proposed function would be useful for consuming a set in parallel, similar to Guy Steele's arguments in recent talks (see Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmfulhttp://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf). That is, one splits a tree and uses "par" to process the left and right half in parallel, enabling Alternatively, it would be simple to provide a parallel version of mapMonotonic that does this internally, i.e.: mapMonotonic _ Tip = Tip mapMonotonic f (Bin sz x l r) = Bin sz (f x) (mapMonotonic f l) (mapMonotonic f r) Becomes something like: parMapMonotonic f (Bin sz x l r) = par left $ seq right $ Bin sz (f x) left right where left = mapMonotonic f l right = mapMonotonic f r Cheers, -Ryan

Ryan Newton wrote:
Would there be anything wrong with a Data.Set simply chopping off half its (balanced) tree and returning two approximately balanced partitions ... cleave :: Set a -> (Set a, Set a) cleave Tip = (Tip, Tip) cleave (Bin _ x l r) | size l > size r = (l, insertMin x r) | otherwise = (insertMax x l, r)
This function would expose some of the internal structure of Set - i.e. there could be equal sets s1 == s2 with cleave s1 /= cleave s2. Maybe a better idea than to expose such a function would be to split Data.Set into Data.Set.Internal and Data.Set, where Data.Set.Internal would export the actual Tip and Bin constructors. Then people who want to break the abstraction, for example to experiment with parallel folds, could do that easily. regards, Bertram

That sounds good to me. In any case the parallel map/fold operations by themselves shouldn't compromise the abstraction. Perhaps an eventual solution would be to start including parallel maps/folds right inside the standard libraries. I haven't began testing this yet but it would seem that all the balanced tree implementations are good candidates for a little `par` treatment. Has this been tried somewhere already? Alas, having par/nonpar versions of library functions compounds the already present monadic/non-monadic schism... Anyway, right this second I'm primarily interested in speeding up difference and intersection -- that would be really useful for a simple utility I've been using that compares files as Maps of word-tuples and runs rather slowly (http://hackage.haskell.org/package/wordsetdiff). Cheers, -Ryan On Mon, Oct 4, 2010 at 11:00 AM, Bertram Felgenhauer < bertram.felgenhauer@googlemail.com> wrote:
Ryan Newton wrote:
Would there be anything wrong with a Data.Set simply chopping off half its (balanced) tree and returning two approximately balanced partitions ... cleave :: Set a -> (Set a, Set a) cleave Tip = (Tip, Tip) cleave (Bin _ x l r) | size l > size r = (l, insertMin x r) | otherwise = (insertMax x l, r)
This function would expose some of the internal structure of Set - i.e. there could be equal sets s1 == s2 with cleave s1 /= cleave s2.
Maybe a better idea than to expose such a function would be to split Data.Set into Data.Set.Internal and Data.Set, where Data.Set.Internal would export the actual Tip and Bin constructors. Then people who want to break the abstraction, for example to experiment with parallel folds, could do that easily.
regards,
Bertram

On 10/19/10 5:47 AM, Ryan Newton wrote:
That sounds good to me. In any case the parallel map/fold operations by themselves shouldn't compromise the abstraction.
Perhaps an eventual solution would be to start including parallel maps/folds right inside the standard libraries. I haven't began testing this yet but it would seem that all the balanced tree implementations are good candidates for a little `par` treatment. Has this been tried somewhere already? Alas, having par/nonpar versions of library functions compounds the already present monadic/non-monadic schism...
Another problem is that the desirable level of parallelization isn't fixed. For example, let's consider a function f being mapped over a collection C. With non-parallel map this has cost O(0*k) + O(C)*O(f) where k is the cost of spawning/reaping threads, O(C) is the size of the collection, and O(f) is the cost of running f once. Now, consider mapPar2 which uses two threads. The cost of this is O(1*k) + 2 `par` O(C)/2*O(f), where `par` is a kind of multiplication based on the level of parallelism actually achieved. With perfect paralellism x`par`y = y; with no parallelism x`par`y = x*y. We can generalize this to O((n-1)*k) + n `par` O(C)/n*O(f) for n threads. But the problem is, depending on how big O(f) is relative to O(k), there's going to be a different n which gives the optimal tradeoff. If O(f) is small enough, then the overhead of sparking threads is wasted; if O(f) is middling, then we'd want a few threads but not "too many"; if O(f) is huge, then we want n = O(C) since the overhead disappears in the asymptotics. Thus, what we'd need to offer in the API is a set of evaluators in order to alter the level of parallelization. -- Live well, ~wren

On 20 October 2010 04:05, wren ng thornton
On 10/19/10 5:47 AM, Ryan Newton wrote:
That sounds good to me. In any case the parallel map/fold operations by themselves shouldn't compromise the abstraction.
Perhaps an eventual solution would be to start including parallel maps/folds right inside the standard libraries. I haven't began testing this yet but it would seem that all the balanced tree implementations are good candidates for a little `par` treatment. Has this been tried somewhere already? Alas, having par/nonpar versions of library functions compounds the already present monadic/non-monadic schism...
Another problem is that the desirable level of parallelization isn't fixed. For example, let's consider a function f being mapped over a collection C.
With non-parallel map this has cost O(0*k) + O(C)*O(f) where k is the cost of spawning/reaping threads, O(C) is the size of the collection, and O(f) is the cost of running f once.
Now, consider mapPar2 which uses two threads. The cost of this is O(1*k) + 2 `par` O(C)/2*O(f), where `par` is a kind of multiplication based on the level of parallelism actually achieved. With perfect paralellism x`par`y = y; with no parallelism x`par`y = x*y.
You probably mean x + y.
We can generalize this to O((n-1)*k) + n `par` O(C)/n*O(f) for n threads. But the problem is, depending on how big O(f) is relative to O(k), there's going to be a different n which gives the optimal tradeoff. If O(f) is small enough, then the overhead of sparking threads is wasted; if O(f) is middling, then we'd want a few threads but not "too many"; if O(f) is huge, then we want n = O(C) since the overhead disappears in the asymptotics. Thus, what we'd need to offer in the API is a set of evaluators in order to alter the level of parallelization.
Actually, it doesn't work that way anymore. Threads are not (or only rarely) created to run sparks. Instead, idle threads look into the spark pool and grab and execute some sparks. Nowadays, the main overhead you get is from stealing sparks from other CPUs which will incur a lot of cache misses. If the other CPU is busy it may still be worth it, but it's quite hard to predict. Sparks are also run in chunks which makes it less expensive to have small sparks. -- Push the envelope. Watch it bend.
participants (5)
-
Bertram Felgenhauer
-
Ryan Newton
-
Ryan Newton
-
Thomas Schilling
-
wren ng thornton