Re: Parallel Haskell: 2-year project to push real world use

* Roman Leshchinskiy
On 04/05/2010, at 09:21, Christian Höner zu Siederdissen wrote:
Hi,
on that topic, consider this (rather trivial) array:
a = array (1,10) [ (i,f i) | i <-[1..10]] where f 1 = 1 f 2 = 1 f i = a!(i-1) + a!(i-2)
(aah, school ;)
Right now, I am abusing vector in ST by doing this:
a <- new a' <- freeze a forM_ [3..10] $ \i -> do write a (a'!(i-1) + a!(i-2))
Let's say I wanted to do something like this in dph (or repa), does that work? We are actually using this for RNA folding algorithms that are at least O(n^3) time. For some of the more advanced stuff, it would be really nice if we could "just" parallelize.
Do you really just need a prefix sum? These are easily parallelisable if the operator is associative. For instance, you could implement the Fibonacci sequence as:
mapP fst $ scanP (\(a,b) _ -> (a+b,a)) (1,0) $ replicateP n (0,0)
and DPH would parallelise it. That's how I would write the above with vector as well.
That is, kind of, the fun part: you have (1) a number of vectors whose values depend on each other (bad!) (2) in-place update (bad!) (3) rather trivial calculations for each element (mostly: sum, minimum, fold, map, backpermute) (good!), we have simple semiring calculations here
To summarise: I need arrays that allow in-place updates.
In-place updates + parallelism = bad! That's oversimplifying, of course. But the transformations underlying DPH, for instance, simply don't work in the presence of side effects.
The thing is, you can write the algorithm in a way such that each operation on index "k" (whatever dimension k has) only requires access to values <=k and those values will never change again. The problem is that more than one vector is involved making it less fun to write code like your fibonacci example.
Otherwise, most libraries that do heavy stuff (O(n^3) or worse) are using vector right now. On a single core, it performs really great -- even compared to C-code that has been optimized a lot.
That's great to know! Do you (or anyone else) by any chance have any benchmarks you could share? At the moment, I'm only benchmarking vector with a couple of rather simplistic algorithms which is a bit of a problem.
I can make my libraries available under GPLv3, they just need a bit of love. This gives you a moderately complex algorithm for which there is, too, a highly optimized C version (RNAfold -d2, in the vienna rna package). I am giving a talk wednesday, after that I'll prepare the libraries for hackage. Gruss, Christian
participants (1)
-
Christian Höner zu Siederdissen