Re: [Haskell-cafe] transparent parallelization

19 Sep 2007

      On Tue, 2007-09-18 at 18:13 +0200, Thomas Girod wrote:
...
Hi there. Beeing rather new to the realm of Haskell and functional
programming, I've been reading about "how is easier it is to
parallelize code in a purely functional language" (am I right saying
that ?).
My knowledge of parallelization is also very weak, but I've been
thinking about this and I have a question. Let's say we take a small
piece of code, like the quicksort algorithm.
...
qsort [] = []
qsort (x:xs) = qsort lesser ++ [x] ++ qsort greater 
        where lesser = filter (=x) xs
(actually I didn't test this code, this is just for the example)
Here, computing "lesser" and "greater" are two disjoint actions - we
don't need the result from one to compute the other, and haskell does
not allow one to alter data so that would change the behaviour of the
other. So I guess those could be run in parallel.
Would it be fairly simple to automatically determine parts of code
that can be run in parallel, and parts that cannot (at least in
non-monadic code) ?
So the final question is : if it is possible to automatically define
segments of code that can be run in parallel, is [insert compiler name
here] compiling this code as a one thread thing, or as a multithreaded
version ?
I guess on single processor machines, it is more efficient to do it as
a single thread - but with those "many-cores" processors that are
coming, it probably won't be true anymore.
Sorry if this post is blowing off open doors (I guess this doesn't
mean anything in english but it sounds cool) ...
Detecting parallelism is possible, but generally rather fragile.
Coarse-grained parallelism in form of threads (or processes) is only
efficient if enough data can be processed in parallel.  This in turn is
determined by the data-dependencies, which are hard to detect
automatically.  To preserve program semantics the analysis has to be
conservative, i.e., assume that two parts of a program depend on each
other unless it can prove otherwise.  (OpenMP relies on the programmer
to explicitly declare what to parallelize.)

A better way is to let the user specify the algorithm at a higher level.
One very promising technique to do this is explored in Data Parallel
Haskell (DPH) [1].  In DPH you specify your algorithm as functional
programs operating on vectors, and even allows nested parallelism, i.e.,
you can call parallel functions inside parallel functions.  If
implemented naïvely, this could easily lead to inefficiencies due to too
little workload per thread.  This is where GHCs rewriting capabilities
kick in and transform nested parallel programs into flat parallel
programs.  I really recommend reading the paper(s) (see [1]).

/ Thomas

[1] .. http://haskell.org/haskellwiki/GHC/Data_Parallel_Haskell

Re: [Haskell-cafe] transparent parallelization

Thomas Schilling