
Hello; I have a piece of code in which I employ the `par` construct to add some implicit parallelism to a theorem prover. However, when running the *same* code with +RTS -N1 +RTS -N5 +RTS -N10 I see a huge slowdown (a factor of 50 with 5 processes and a factor of 100 for 10 on an 8-core machine). Very little time is being spent using the garbage collector. Any suggestions? Thanks, -Jamie

Hi Jamie,
First question, what version of GHC are you using? There are
significant performance improvements to parallel code in GHC 6.12, so
it's worth an upgrade. Once you've upgraded you might want to try out
threadscope which is designed to help track down these sorts of
problems.
If you are using 6.10, I recommend turning off parallel garbage
collection with the RTS flags (see the manual) as that can cause
slowdowns.
Thanks, Neil
2010/1/4 Jamie Morgenstern
Hello;
I have a piece of code in which I employ the `par` construct to add some implicit parallelism to a theorem prover. However, when running the *same* code with
+RTS -N1 +RTS -N5 +RTS -N10
I see a huge slowdown (a factor of 50 with 5 processes and a factor of 100 for 10 on an 8-core machine).
Very little time is being spent using the garbage collector. Any suggestions?
Thanks, -Jamie
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I am using 6.12... are there any good pointers as to how one uses
threadscope?
On Mon, Jan 4, 2010 at 3:14 PM, Neil Mitchell
Hi Jamie,
First question, what version of GHC are you using? There are significant performance improvements to parallel code in GHC 6.12, so it's worth an upgrade. Once you've upgraded you might want to try out threadscope which is designed to help track down these sorts of problems.
If you are using 6.10, I recommend turning off parallel garbage collection with the RTS flags (see the manual) as that can cause slowdowns.
Thanks, Neil
2010/1/4 Jamie Morgenstern
: Hello;
I have a piece of code in which I employ the `par` construct to add some implicit parallelism to a theorem prover. However, when running the *same* code with
+RTS -N1 +RTS -N5 +RTS -N10
I see a huge slowdown (a factor of 50 with 5 processes and a factor of 100 for 10 on an 8-core machine).
Very little time is being spent using the garbage collector. Any suggestions?
Thanks, -Jamie
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

There was a paper at the Haskell Symposium 2009, and the video is
online: http://www.vimeo.com/6680185
Thanks, Neil
On Tue, Jan 5, 2010 at 2:23 AM, Jamie Morgenstern
I am using 6.12... are there any good pointers as to how one uses threadscope?
On Mon, Jan 4, 2010 at 3:14 PM, Neil Mitchell
wrote: Hi Jamie,
First question, what version of GHC are you using? There are significant performance improvements to parallel code in GHC 6.12, so it's worth an upgrade. Once you've upgraded you might want to try out threadscope which is designed to help track down these sorts of problems.
If you are using 6.10, I recommend turning off parallel garbage collection with the RTS flags (see the manual) as that can cause slowdowns.
Thanks, Neil
2010/1/4 Jamie Morgenstern
: Hello;
I have a piece of code in which I employ the `par` construct to add some implicit parallelism to a theorem prover. However, when running the *same* code with
+RTS -N1 +RTS -N5 +RTS -N10
I see a huge slowdown (a factor of 50 with 5 processes and a factor of 100 for 10 on an 8-core machine).
Very little time is being spent using the garbage collector. Any suggestions?
Thanks, -Jamie
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (2)
-
Jamie Morgenstern
-
Neil Mitchell