Seen on reddit: or, foldl and foldr considered slightly harmful

In a presentation of Guy Steele for ICFP 2009 in Edinburgh: http://www.vimeo.com/6624203 he "considers foldl and foldr" harmful as they hinder parallelism because of "Process first element, then the rest" Instead he proposes a divide and merge aproach, especially in the light of going parallel. The slides at http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.sun.com%2Fprojects%2... [Bware: Google docs] are somewhat geared towards Fortress, but I wonder what Haskellers have to say about his position. Greetings, Johann

Johann Höchtl
In a presentation of Guy Steele for ICFP 2009 in Edinburgh: http://www.vimeo.com/6624203 he "considers foldl and foldr" harmful as they hinder parallelism because of "Process first element, then the rest" Instead he proposes a divide and merge aproach, especially in the light of going parallel.
In Haskell foldl/foldr apply to linked lists (or lazy streams, if you prefer) which are already inherently sequential, and gets a rather harsh treatment. I notice he points to finger trees, which I though was implemented in Data.Sequence.
are somewhat geared towards Fortress, but I wonder what Haskellers have to say about his position.
Can we (easily) parallelize operations on Data.Sequence? Often, the devil is in the details, and there's a lot of ground to cover between 'trees are easier to parallelize' to an efficient and effective high level interface. (E.g. non-strict semantics allowing speculative evalutaion - you still need to insert manual `par`s, right?) -k -- If I haven't seen further, it is by standing in the footprints of giants

Isn't it the kind of things Data Parallel Haskell is achieving ? I'm in no way an expert of the field, but from what I've read on the subject it looked like : I have a list of N elements and I want to map the function F on it. technically, I could spawn N processes and build the result from that, but it would be highly inefficient. So the really hard part is to guess how I should split my data to get the best performances. Well, I guess it's pretty easy for a flat structure if you have access to it's length, but for a recursive one it is complicated as you don't know if a branch of the tree will lead to a leaf or a huge subtree ... the "evil detail" ! Tom On Thu, Feb 11, 2010 at 11:00:51AM +0100, Ketil Malde wrote:
Johann Höchtl
writes: In a presentation of Guy Steele for ICFP 2009 in Edinburgh: http://www.vimeo.com/6624203 he "considers foldl and foldr" harmful as they hinder parallelism because of "Process first element, then the rest" Instead he proposes a divide and merge aproach, especially in the light of going parallel.
In Haskell foldl/foldr apply to linked lists (or lazy streams, if you prefer) which are already inherently sequential, and gets a rather harsh treatment. I notice he points to finger trees, which I though was implemented in Data.Sequence.
are somewhat geared towards Fortress, but I wonder what Haskellers have to say about his position.
Can we (easily) parallelize operations on Data.Sequence? Often, the devil is in the details, and there's a lot of ground to cover between 'trees are easier to parallelize' to an efficient and effective high level interface. (E.g. non-strict semantics allowing speculative evalutaion - you still need to insert manual `par`s, right?)
-k -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, Feb 11, 2010 at 11:00:51AM +0100, Ketil Malde wrote:
Johann Höchtl
writes: In a presentation of Guy Steele for ICFP 2009 in Edinburgh: http://www.vimeo.com/6624203 he "considers foldl and foldr" harmful as they hinder parallelism because of "Process first element, then the rest" Instead he proposes a divide and merge aproach, especially in the light of going parallel.
In Haskell foldl/foldr apply to linked lists (or lazy streams, if you prefer) which are already inherently sequential, and gets a rather harsh treatment. I notice he points to finger trees, which I though was implemented in Data.Sequence.
Direct URL for the slides: http://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf As he says, associativity is the key to parallelism -- an old observation, but still underappreciated. Even without parallelism, associativity also gives us greater freedom in structuring our solutions. The moral is that our datatypes need associative binary operations more than asymmetric ones. We use lists too much (because they're so convenient) and apart from the loss of parallelism it constrains our thinking to the sequential style criticised by Backus back in 1978. Regarding finger trees, he's just referring to the idea of caching the results of monoidal folds in nodes of a tree. That's crucial to the applications of finger trees, but it can be applied to any kind of tree. As he mentions, it's related to the Ladner-Fischer parallel prefix algorithm, which has an upward pass accumulating sums for each subtree followed by a downward accumulation passing each sum into the subtree to the right. But it's not just for parallelism: when you have these cached values in a balanced tree, you can compute the sum of any prefix in O(log n) steps.

On Feb 11, 2010, at 3:41 AM, Johann Höchtl wrote:
In a presentation of Guy Steele for ICFP 2009 in Edinburgh: http://www.vimeo.com/6624203 he "considers foldl and foldr" harmful as they hinder parallelism because of "Process first element, then the rest" Instead he proposes a divide and merge aproach, especially in the light of going parallel.
The slides at http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.sun.com%2Fprojects%2... [Bware: Google docs]
There's no need to use Google docs. A direct url for the pdf: http://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf I recently gave a followup talk at Portland State, arguing that notation matters, and that even with better notation programmer mindset is also going to be hard to change: http://research.sun.com/projects/plrg/Publications/PSUJan2010-Maessen.pdf The key thing here isn't *just* the handedness of lists, but the handedness of foldl/foldr *irrespective of the underlying data structure*. So switching to tree-structured data a la fingertrees is a necessary step, but not a sufficient one. The use of monoidal reductions has always been an important part of parallel programming.
are somewhat geared towards Fortress, but I wonder what Haskellers have to say about his position.
Now, what if list comprehensions were really shorthand for construction of Applicative or Monoid structures by traversing a mixture of data types with a common interface (something like this one)? class Generator t e | t -> e mapReduce :: (Monoid m) => t -> (e -> m) -> m -Jan-Willem Maessen Another Fortress/Haskell crossover
Greetings,
Johann _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Feb 11, 2010, at 9:41 PM, Johann Höchtl wrote:
In a presentation of Guy Steele for ICFP 2009 in Edinburgh: http://www.vimeo.com/6624203 he "considers foldl and foldr" harmful as they hinder parallelism because of "Process first element, then the rest" Instead he proposes a divide and merge aproach, especially in the light of going parallel.
I think I first heard about that issue in the 80s. Let me just take an Erlang perspective on this for a few minutes. Ignoring types, the classic foldl in Erlang (which is strict) is foldl(F, A, [X|Xs]) -> foldl(F, F(X, A), Xs); foldl(_, A, []) -> A. In a strict language, you have to wait for F to finish before you can go on to the next element. In a non-strict language, you don't. The interesting question is how far F can get just given X, before it has to look at A. Suppose we can factor F(X, A) into G(H(X), A) where H is rather time-consuming, but G is cheap (maybe it's an add). So we write foldl_map(G, H, A, [X|Xs]) -> foldl_map(G, H, G(H(X), A), Xs); foldl_map(_, _, A, []) -> A. Now we can parallelise it. We'll spark off a separate thread for each call of H. pfoldl_map(G, H, A, Xs) -> Outer = self(), Pids = [spawn(fun () -> Outer ! {self(),H(X)} end || X <- Xs], foldl(G, A, [receive {Pid,Val} -> Val end | Pid <- Pids]). If N is the length of the list and G is O(1) we have O(N) time to traverse the list and O(N) time to collect and fold the results. The H calculations can take place in parallel. Provided each H calculation is expensive enough, we may not _care_ about the O(N) bits. In fact, if they aren't, we probably shouldn't be worrying about parallelism here. The problem exists when we *can't* factor F(X, A) into G(H(X), A) with a cheap G and costly H. Then divide-and-parallel-conquer using k-way trees gives us a computation depth of $\log_k N$ calls of G waiting for results to come back. As I say, I met this stuff in the 80s. It's hardly news. Indeed, if I remember correctly, back when Occam was hot stuff people were worried about PAR i = 0 for n ... which forks n processes. Doing that one at a time in the parent process takes O(n) time. I believe something was done about making this work by recursive bisection.
participants (6)
-
Jan-Willem Maessen
-
Johann Höchtl
-
Ketil Malde
-
Richard O'Keefe
-
Ross Paterson
-
Thomas Girod