Re: Parallel Haskell: 2-year project to push real world use

3 May 2010

      * Roman Leshchinskiy  [04.05.2010 02:32]:
...
On 04/05/2010, at 09:21, Christian Höner zu Siederdissen wrote:
...
Hi,
on that topic, consider this (rather trivial) array:
a = array (1,10) [ (i,f i) | i <-[1..10]] where
 f 1 = 1
 f 2 = 1
 f i = a!(i-1) + a!(i-2)
(aah, school ;)
Right now, I am abusing vector in ST by doing this:
a <- new
a' <- freeze a
forM_ [3..10] $ \i -> do
 write a (a'!(i-1) + a!(i-2))
Let's say I wanted to do something like this in dph (or repa), does that
work? We are actually using this for RNA folding algorithms that are at
least O(n^3) time. For some of the more advanced stuff, it would be
really nice if we could "just" parallelize.
Do you really just need a prefix sum? These are easily parallelisable if the operator is associative. For instance, you could implement the Fibonacci sequence as:
mapP fst $ scanP (\(a,b) _ -> (a+b,a)) (1,0) $ replicateP n (0,0)
and DPH would parallelise it. That's how I would write the above with vector as well.
That is, kind of, the fun part: you have

(1) a number of vectors whose values depend on each other (bad!)
(2) in-place update (bad!)
(3) rather trivial calculations for each element (mostly:
sum, minimum, fold, map, backpermute) (good!), we have simple semiring
calculations here
...
...
To summarise: I need arrays that allow in-place updates.
In-place updates + parallelism = bad! That's oversimplifying, of course. But the transformations underlying DPH, for instance, simply don't work in the presence of side effects.
The thing is, you can write the algorithm in a way such that each
operation on index "k" (whatever dimension k has) only requires access
to values <=k and those values will never change again. The problem is
that more than one vector is involved making it less fun to write code
like your fibonacci example.
...
...
Otherwise, most libraries that do heavy stuff (O(n^3) or worse) are
using vector right now. On a single core, it performs really great --
even compared to C-code that has been optimized a lot.
That's great to know! Do you (or anyone else) by any chance have any benchmarks you could share? At the moment, I'm only benchmarking vector with a couple of rather simplistic algorithms which is a bit of a problem.
I can make my libraries available under GPLv3, they just need a bit of
love. This gives you a moderately complex algorithm for which there is,
too, a highly optimized C version (RNAfold -d2, in the vienna rna
package).
I am giving a talk wednesday, after that I'll prepare the libraries for
hackage.

Gruss,
Christian

Christian Höner zu Siederdissen

tags

participants (1)