[Haskell-cafe] efficient parallel foldMap for lists/sequences

1 Apr 2011

      Hello Haskellers,

in parallel programs it is a common pattern to accumulate a list of
results in parallel using an associative operator. This can be seen as
a simple form of the map-reduce pattern where each element of the list
is mapped into a monoid before combining the results using `mconcat`.
This pattern can be abstracted by the `foldMap` function of the
`Foldable` class and we can give a simple parallel implementation
using `par` and `pseq`:

~~~
import Data.Monoid
import Control.Parallel

foldMap :: Monoid m => (a -> m) -> [a] -> m
foldMap f []  = mempty
foldMap f [x] = f x
foldMap f xs  =
  let (ys,zs) = splitAt (length xs `div` 2) xs
      ys'     = foldMap f ys
      zs'     = foldMap f zs
   in zs' `par` (ys' `pseq` (ys' `mappend` zs'))
~~~

How can this pattern be implemented in Haskell efficiently?

I plan to investigate the following options and am interested in previous work.

1. Data.Sequence (from containers)

As finger trees are already balanced they look like a good candidate
for parallel consumption. However, the current implementation of the
`Foldable` instance is sequential. I wonder what would be the overhead
and how it could be reduced if it is is too large. I think, I would
first go for an implementation outside of `Foldable` and later
consider the overhead of overloading.

2. Data.Array.Repa (from repa)

provides a `map` and (sequential) `fold` functions. I think that the
main advantage of repa is fusion of successive traversals which is not
necessary for the pattern in question. Hence, I'm not sure whether
it's a good candidate for the job. Also I don't know how to implement
the pattern using repa. Can it be done using `slice` or should it be
done by changing repa itself?

3. Data.Monoid.Reducer (from monoids)

`foldMapReduce` looks promising. I wonder whether it could be used for
parallel reduction and how efficient it would be.

Which approach do you think is most promising? What other options are there?

Best,
Sebastian

[Haskell-cafe] efficient parallel foldMap for lists/sequences

Sebastian Fischer