Concurrency performance problem

Hello Cafe! I have a problem with following code: http://hpaste.org/83460. It is a simple Monte Carlo integration. The problem is that when I run my program with +RTS -N1 I get: Multi 693204.039020917 8.620632s Single 693204.039020917 8.574839s End And with +RTS -N4 (I have four CPU cores): Multi 693204.0390209169 11.877143s Single 693204.039020917 11.399888s End I have two questions: 1) Why performance decreases when I add more cores for my program? 2) Why performance of single threaded integration also changes with number of cores? Thanks for all answers, Łukasz Dąbek.

Depends on your code...
On Mar 4, 2013 6:10 PM, "Łukasz Dąbek"
Hello Cafe!
I have a problem with following code: http://hpaste.org/83460. It is a simple Monte Carlo integration. The problem is that when I run my program with +RTS -N1 I get: Multi 693204.039020917 8.620632s Single 693204.039020917 8.574839s End
And with +RTS -N4 (I have four CPU cores): Multi 693204.0390209169 11.877143s Single 693204.039020917 11.399888s End
I have two questions: 1) Why performance decreases when I add more cores for my program? 2) Why performance of single threaded integration also changes with number of cores?
Thanks for all answers, Łukasz Dąbek.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

What do you exactly mean? I have included link to full source listing:
http://hpaste.org/83460.
--
Łukasz Dąbek
2013/3/4 Don Stewart
Depends on your code...
On Mar 4, 2013 6:10 PM, "Łukasz Dąbek"
wrote: Hello Cafe!
I have a problem with following code: http://hpaste.org/83460. It is a simple Monte Carlo integration. The problem is that when I run my program with +RTS -N1 I get: Multi 693204.039020917 8.620632s Single 693204.039020917 8.574839s End
And with +RTS -N4 (I have four CPU cores): Multi 693204.0390209169 11.877143s Single 693204.039020917 11.399888s End
I have two questions: 1) Why performance decreases when I add more cores for my program? 2) Why performance of single threaded integration also changes with number of cores?
Thanks for all answers, Łukasz Dąbek.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Apologies, didn't see the link on my phone :)
As the comment on the link shows, youre accidentally migrating unevaluated
work to the main thread, hence no speedup.
Be very careful with evaluation strategies (esp. lazy expressions) around
MVar and TVar points. Its too easy to put a thunk in one.
The strict-concurrency package is one attempt to invert the conventional
lazy box, to better match thge most common case.
On Mar 4, 2013 7:25 PM, "Łukasz Dąbek"
What do you exactly mean? I have included link to full source listing: http://hpaste.org/83460.
-- Łukasz Dąbek
2013/3/4 Don Stewart
: Depends on your code...
On Mar 4, 2013 6:10 PM, "Łukasz Dąbek"
wrote: Hello Cafe!
I have a problem with following code: http://hpaste.org/83460. It is a simple Monte Carlo integration. The problem is that when I run my program with +RTS -N1 I get: Multi 693204.039020917 8.620632s Single 693204.039020917 8.574839s End
And with +RTS -N4 (I have four CPU cores): Multi 693204.0390209169 11.877143s Single 693204.039020917 11.399888s End
I have two questions: 1) Why performance decreases when I add more cores for my program? 2) Why performance of single threaded integration also changes with number of cores?
Thanks for all answers, Łukasz Dąbek.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thank you for your help! This solved my performance problem :)
Anyway, the second question remains. Why performance of single
threaded calculation is affected by RTS -N parameter. Is GHC doing
some parallelization behind the scenes?
--
Łukasz Dąbek.
2013/3/4 Don Stewart
Apologies, didn't see the link on my phone :)
As the comment on the link shows, youre accidentally migrating unevaluated work to the main thread, hence no speedup.
Be very careful with evaluation strategies (esp. lazy expressions) around MVar and TVar points. Its too easy to put a thunk in one.
The strict-concurrency package is one attempt to invert the conventional lazy box, to better match thge most common case.
On Mar 4, 2013 7:25 PM, "Łukasz Dąbek"
wrote: What do you exactly mean? I have included link to full source listing: http://hpaste.org/83460.
-- Łukasz Dąbek
2013/3/4 Don Stewart
: Depends on your code...
On Mar 4, 2013 6:10 PM, "Łukasz Dąbek"
wrote: Hello Cafe!
I have a problem with following code: http://hpaste.org/83460. It is a simple Monte Carlo integration. The problem is that when I run my program with +RTS -N1 I get: Multi 693204.039020917 8.620632s Single 693204.039020917 8.574839s End
And with +RTS -N4 (I have four CPU cores): Multi 693204.0390209169 11.877143s Single 693204.039020917 11.399888s End
I have two questions: 1) Why performance decreases when I add more cores for my program? 2) Why performance of single threaded integration also changes with number of cores?
Thanks for all answers, Łukasz Dąbek.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Mon, Mar 4, 2013 at 11:39 AM, Łukasz Dąbek
Thank you for your help! This solved my performance problem :)
Anyway, the second question remains. Why performance of single threaded calculation is affected by RTS -N parameter. Is GHC doing some parallelization behind the scenes?
I believe it's because -N makes GHC use the threaded RTS, which is different from the non-threaded RTS and has some overheads therefore.

2013/3/4 Johan Tibell
I believe it's because -N makes GHC use the threaded RTS, which is different from the non-threaded RTS and has some overheads therefore.
That's interesting. Can you recommend some reading materials about this? Besides GHC source, of course ;) Explanation of why decrease in performance is proportional to number of cores would be great. -- Łukasz Dąbek

If you just pass -N, GHC automatically sets the number of threads based on the number of cores on your machine. Do you mean -threaded? Excerpts from Łukasz Dąbek's message of Mon Mar 04 11:39:43 -0800 2013:
Thank you for your help! This solved my performance problem :)
Anyway, the second question remains. Why performance of single threaded calculation is affected by RTS -N parameter. Is GHC doing some parallelization behind the scenes?
-- Łukasz Dąbek.
2013/3/4 Don Stewart
: Apologies, didn't see the link on my phone :)
As the comment on the link shows, youre accidentally migrating unevaluated work to the main thread, hence no speedup.
Be very careful with evaluation strategies (esp. lazy expressions) around MVar and TVar points. Its too easy to put a thunk in one.
The strict-concurrency package is one attempt to invert the conventional lazy box, to better match thge most common case.
On Mar 4, 2013 7:25 PM, "Łukasz Dąbek"
wrote: What do you exactly mean? I have included link to full source listing: http://hpaste.org/83460.
-- Łukasz Dąbek
2013/3/4 Don Stewart
: Depends on your code...
On Mar 4, 2013 6:10 PM, "Łukasz Dąbek"
wrote: Hello Cafe!
I have a problem with following code: http://hpaste.org/83460. It is a simple Monte Carlo integration. The problem is that when I run my program with +RTS -N1 I get: Multi 693204.039020917 8.620632s Single 693204.039020917 8.574839s End
And with +RTS -N4 (I have four CPU cores): Multi 693204.0390209169 11.877143s Single 693204.039020917 11.399888s End
I have two questions: 1) Why performance decreases when I add more cores for my program? 2) Why performance of single threaded integration also changes with number of cores?
Thanks for all answers, Łukasz Dąbek.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

2013/3/4
do you have a link to the new code ?
Diff is at the bottom of original code: http://hpaste.org/83460. If you just pass -N, GHC automatically sets the number of threads
based on the number of cores on your machine.
Yes, I know that. I am just wondering why seemingly single threaded computation (look at singleThreadIntegrate in source code from first post) runs slower with increasing number of cores available (set through -N option). -- Łukasz Dąbek

Depends on the application, of course. The (on by default) parallel GC
tends to kill performance for me... you might try running both with "+RTS
-sstderr" to see if GC time is significantly higher, and try adding "+RTS
-qg1" if it is.
On Mon, Mar 4, 2013 at 2:23 PM, Łukasz Dąbek
2013/3/4
: do you have a link to the new code ?
Diff is at the bottom of original code: http://hpaste.org/83460.
If you just pass -N, GHC automatically sets the number of threads
based on the number of cores on your machine.
Yes, I know that. I am just wondering why seemingly single threaded computation (look at singleThreadIntegrate in source code from first post) runs slower with increasing number of cores available (set through -N option).
-- Łukasz Dąbek
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

2013/3/5 Nathan Howell
Depends on the application, of course. The (on by default) parallel GC tends to kill performance for me... you might try running both with "+RTS -sstderr" to see if GC time is significantly higher, and try adding "+RTS -qg1" if it is.
You are correct: parallel GC is slowing computation down. After some experiments I can produce two behaviors: use single threaded GC (multithreaded version is slowed down by factor of 5 - but single threaded backs to normal) or increase heap size (multithreaded version slows down by factor of 2, single threaded version runs normally). I guess I must live with this ;) -- Łukasz Dąbek
participants (6)
-
briand@aracnet.com
-
Don Stewart
-
Edward Z. Yang
-
Johan Tibell
-
Nathan Howell
-
Łukasz Dąbek