Automatic parallelism in Haskell, similar to "make -j4"?

I was surprised when I read the multi-core section of Real World Haskell which explains the use of par, seq, and force to achieve parallelism. While it's obvious a programmer could provide useful parallelism hints to the compiler, given the nature of the language I would have thought Haskell could do a significant amount of parallelism itself without any hints or code changes at all. I am thinking of our troglodytic friend 'make', which will run (for example) 4 parallel jobs when given the option "make -j4". Even 'rake', the ruby version of make, now has a branch (called drake) which does the parallel -j option. Since much of Haskell code is free of side effects, it would seem that it can be analyzed in similar manner to Makefile dependencies in order to find sections which can be run in parallel. What would it take to implement a -j equivalent for, say, GHC? Or if this is not possible, what is wrong with my reasoning? Thanks, TW

Hello T, Monday, November 3, 2008, 2:28:08 AM, you wrote:
What would it take to implement a -j equivalent for, say, GHC? Or if this is not possible, what is wrong with my reasoning?
problem is that make have rather large pices of work which it can run parallel. if ghc will try to parallel every machine operation, it will pend more time maintaining these jobs. 'par' is just the way to tell GHC "this part of job is large enough" -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Sun, Nov 2, 2008 at 6:44 PM, Bulat Ziganshin
What would it take to implement a -j equivalent for, say, GHC? Or if this is not possible, what is wrong with my reasoning?
problem is that make have rather large pices of work which it can run parallel. if ghc will try to parallel every machine operation, it will pend more time maintaining these jobs. 'par' is just the way to tell GHC "this part of job is large enough"
Right, but couldn't the Haskell complier+runtime discover "rather large pieces of work"?

On Sun, Nov 2, 2008 at 12:42 PM, T Willingham
On Sun, Nov 2, 2008 at 6:44 PM, Bulat Ziganshin
wrote: What would it take to implement a -j equivalent for, say, GHC? Or if this is not possible, what is wrong with my reasoning?
problem is that make have rather large pices of work which it can run parallel. if ghc will try to parallel every machine operation, it will pend more time maintaining these jobs. 'par' is just the way to tell GHC "this part of job is large enough"
Right, but couldn't the Haskell complier+runtime discover "rather large pieces of work"?
Perhaps, but not easily. Especially if it can be done statically, there is plenty of research to be done in this area. Haskell is rather different from make. The graph of a lambda calculus program is not nearly as transparent as the graph of a makefile -- *especially* considering lazy evaluation. For example, you might think: map (+1) [1..10000000] Is a "rather large piece of work", but if it is then applied to "take 4", that is no longer true. We don't want to be futilely spinning our processor computing this. So my guess is that there are some cases, in the same way as strictness analysis, where you can identify these, but there are many cases which are much more subtle and hard to identify automatically. But I am no expert in the area. Luke

t.r.willingham:
On Sun, Nov 2, 2008 at 6:44 PM, Bulat Ziganshin
wrote: What would it take to implement a -j equivalent for, say, GHC? Or if this is not possible, what is wrong with my reasoning?
problem is that make have rather large pices of work which it can run parallel. if ghc will try to parallel every machine operation, it will pend more time maintaining these jobs. 'par' is just the way to tell GHC "this part of job is large enough"
Right, but couldn't the Haskell complier+runtime discover "rather large pieces of work"?
Requires runtime profiling to work out the costs. See this paper which implements this this idea, PDF http://research.microsoft.com/~tharris/papers/2007-fdip.pdf HTML http://209.85.173.104/search?q=cache:7cC4fQjCEH4J:research.microsoft.com/~th... Note that for subsets of Haskell, where we have more information statically about the costs involves, we can do the parallelism automatically. Data Parallel Haskell is the prime example. -- Don

Hello T, Monday, November 3, 2008, 3:42:49 AM, you wrote:
On Sun, Nov 2, 2008 at 6:44 PM, Bulat Ziganshin
wrote: What would it take to implement a -j equivalent for, say, GHC? Or if this is not possible, what is wrong with my reasoning?
problem is that make have rather large pices of work which it can run parallel. if ghc will try to parallel every machine operation, it will pend more time maintaining these jobs. 'par' is just the way to tell GHC "this part of job is large enough"
Right, but couldn't the Haskell complier+runtime discover "rather large pieces of work"?
are you ever herd about "halting problem"? it's imposible in general case and i doubt how far it may be done on practice. in general, it looks close to really compute the function (and you still need to know its actual input params!) anyway it's not done and i don't heard about researches in this direction -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin
Hello T,
Monday, November 3, 2008, 2:28:08 AM, you wrote:
What would it take to implement a -j equivalent for, say, GHC? Or if this is not possible, what is wrong with my reasoning?
problem is that make have rather large pices of work which it can run parallel. if ghc will try to parallel every machine operation, it will pend more time maintaining these jobs. 'par' is just the way to tell GHC "this part of job is large enough"
..and also that this piece of work will actually need to be evaluated. With lazyness, a program can have subexpressions that are bottom as long as they are not evaluated, any kind of speculative execution must take care to handle this properly. -k -- If I haven't seen further, it is by standing in the footprints of giants

Excerpts from t.r.willingham's message of Sun Nov 02 17:28:08 -0600 2008:
What would it take to implement a -j equivalent for, say, GHC? Or if this is not possible, what is wrong with my reasoning?
Thanks, TW
Hi, The main issue has to do with the decisions the compiler needs to make in order to generate adequate code for a general case. The problem is the compiler has to make decisions about the *granularity* of the threading in particular - the generated code for example may waste a lot of time sparking off threads using 'par' or something, for computations that take less time than the thread-creation itself, meaning the overall cost of that thread being spawned was negative. So your code would need the threading to be more coarse-grained. On the other hand, if you have some computation, the compiler may not adequately sprinkle enough par's throughout the program, and therefore we miss opportunities where the parallelism could give us a speed up - in this case, we need more fine-grained threading. So the problem is (in the general case): given an arbitrary input program, how do you adequately decide what should and should not be parallelized in that program? Even then, how do you decide the granularity at which the threads should operate? It's a pretty tough problem, and I don't think we're even close to full-blown automagically-parallelizing compilers (although with things like parallel strategies and DPH we can get close.) But there is work in this area using profiling information to guide these optimizations, see "Feedback directed implicit parallelism" here: http://research.microsoft.com/~tharris/papers/2007-fdip.pdf Austin

T Willingham
I am thinking of our troglodytic friend 'make', which will run (for example) 4 parallel jobs when given the option "make -j4". Even 'rake', the ruby version of make, now has a branch (called drake) which does the parallel -j option.
From the replies I've seen about this, I think it's been interpreted as asking whether ghc could compile a given program so that it will execute in parallel. In general that's a hard problem.
On the other hand, it should be really straightforward (in principle, I mean) to get something going like ghc --make -j4 Foo.hs similar to your make example, so that compile time could be reduced, while the execution could either be sequential or parallel. I don't think there's anything like this yet (is there?). Does anyone have any thought what it would take to get this going? Chad

On Tue, Nov 4, 2008 at 7:34 PM, Chad Scherrer
T Willingham
writes: I am thinking of our troglodytic friend 'make', which will run (for example) 4 parallel jobs when given the option "make -j4". Even 'rake', the ruby version of make, now has a branch (called drake) which does the parallel -j option.
From the replies I've seen about this, I think it's been interpreted as asking whether ghc could compile a given program so that it will execute in parallel. In general that's a hard problem.
On the other hand, it should be really straightforward (in principle, I mean) to get something going like ghc --make -j4 Foo.hs similar to your make example, so that compile time could be reduced, while the execution could either be sequential or parallel. I don't think there's anything like this yet (is there?).
Does anyone have any thought what it would take to get this going?
Chad
I believe that's the one of the points of the hbuild project (see http://hackage.haskell.org/trac/hackage/wiki/HBuild). Alex

Bulat Ziganshin wrote:
Hello Chad,
Wednesday, November 5, 2008, 6:34:01 AM, you wrote:
ghc --make -j4 Foo.hs
afair, it was implemented and not shown speed improvements. ask Simon
We did get speed improvements, it was the main case study for the initial implementation of shared-memory parallelism in GHC. See http://www.haskell.org/~simonmar/bib/multiproc05_abstract.html However, due to the way GHC is structured internally (it has some global caches updated using unsafePerformIO!) the parallel make implementation was quite fragile, and wasn't suitable for incorporating into GHC proper. It's certainly possible, but we need to think carefully about how to make these caches multi-thread-safe. Cheers, Simon

Excerpts from Chad Scherrer's message of Tue Nov 04 21:34:01 -0600 2008:
Does anyone have any thought what it would take to get this going?
Chad
Currently, franchise supports building in parallel with a -j flag, but the code could definitely be optimized (in my experience, running with something like -j3 on my dual core reduces compile times with franchise on arbitrary projects about 20% currently.) During the 2008 SOC, there was also work on adding this support to cabal, which eventually ended up as the hbuild project. http://hackage.haskell.org/trac/hackage/wiki/HBuild darcs get http://darcs.net/repos/franchise Austin
participants (9)
-
Alexander Dunlap
-
Austin Seipp
-
Bulat Ziganshin
-
Chad Scherrer
-
Don Stewart
-
Ketil Malde
-
Luke Palmer
-
Simon Marlow
-
T Willingham