parList implementation question

Hello, In Control.Parallel.Strategies, parList is defined as parList strat [] = () parList strat (x:xs) = strat x `par` (parList strat xs) with parMap strat f xs = map f xs `using` parList strat. I have recently found that if I define forceParMap strat f xs = map f xs `using` forceParList strat where forceParList strat = foldl (\done -> (done>||) . strat) () then to date, forceParList via forceParMap gives faster results than parList via parMap. For example, in one experiment, parMap with parList run at 0.81 the time of the serial solution whereas forceParMap with forceParList run at 0.58 the time of the serial solution. This is to say, forceParList completed in 0.72 the time of parList. So, 1. Why is forceParList faster than parList? 2. Is this generic to the ghc runtime model or a particularity of the ghc implementation? Thanks in advance for the clarification, - Marcus

On Fri, Dec 18, 2009 at 11:31, Marcus D. Gabriel
than parList via parMap. For example, in one experiment, parMap with parList run at 0.81 the time of the serial solution whereas forceParMap with forceParList run at 0.58 the time of the serial solution. This is to say, forceParList completed in 0.72 the time of parList. So,
I don't know your application, so it's hard to interpret your numbers. Are you sure those measurements are statistically significant? -- Denis

Denis Bueno wrote:
On Fri, Dec 18, 2009 at 11:31, Marcus D. Gabriel
wrote: than parList via parMap. For example, in one experiment, parMap with parList run at 0.81 the time of the serial solution whereas forceParMap with forceParList run at 0.58 the time of the serial solution. This is to say, forceParList completed in 0.72 the time of parList. So,
I don't know your application, so it's hard to interpret your numbers.
Are you sure those measurements are statistically significant?
The particular example that I gave was a straightforward searchDepthFirst(1) algorithm. To parallelise it, I choose the initial, simple approach to determine the first list of successor nodes from an initial node and use parMap to apply the serial algorithm searchDepthFirst to this list and then reduce (concat) the list. With some RTS tuning and using a dedicated dual-core system, I was able for the problem above to reproduce the numbers that I cite above consistently to within about plus or minus a factor of 0.01. I have played with variations of this class of problem but not generally compared parList with forceParList across a broader class of problems. I like your question above. It helped me formulate this question: Is what I observed above significant or random? I hope it is not random. I guess the deeper question is how does one determine with confidence that parallised serial algorithm + startegy1 works for problem class1 and parallised serial algorithm + startegy2 works for problem class2 where both are better than the serial case for their respective class of problems across a given range of operating environments, e.g., -N2 to -N8 for 2 to 8 cores. - Marcus (1) "Algorithms, A Functional Programming Approach" by Fethi Rabhi and Guy Laplame. ISBN 0201-59604-0.

Well, I finally put in place 6.12.1 and read the documentation for Control.Parallel.Strategies. All of my code for the application described below uses Done, demanding, sparking, (>|), and (>||) which are deprecated and which is what I used. Additionally, I need to understand Eval first to change my code. No point in responding until I determine what this means. Thanks nevertheless, - Marcus Marcus D. Gabriel wrote:
Denis Bueno wrote:
On Fri, Dec 18, 2009 at 11:31, Marcus D. Gabriel
wrote: than parList via parMap. For example, in one experiment, parMap with parList run at 0.81 the time of the serial solution whereas forceParMap with forceParList run at 0.58 the time of the serial solution. This is to say, forceParList completed in 0.72 the time of parList. So, I don't know your application, so it's hard to interpret your numbers.
Are you sure those measurements are statistically significant?
The particular example that I gave was a straightforward searchDepthFirst(1) algorithm. To parallelise it, I choose the initial, simple approach to determine the first list of successor nodes from an initial node and use parMap to apply the serial algorithm searchDepthFirst to this list and then reduce (concat) the list.
With some RTS tuning and using a dedicated dual-core system, I was able for the problem above to reproduce the numbers that I cite above consistently to within about plus or minus a factor of 0.01.
I have played with variations of this class of problem but not generally compared parList with forceParList across a broader class of problems.
I like your question above. It helped me formulate this question:
Is what I observed above significant or random? I hope it is not random.
I guess the deeper question is how does one determine with confidence that
parallised serial algorithm + startegy1 works for problem class1
and
parallised serial algorithm + startegy2 works for problem class2
where both are better than the serial case for their respective class of problems across a given range of operating environments, e.g., -N2 to -N8 for 2 to 8 cores.
- Marcus
(1) "Algorithms, A Functional Programming Approach" by Fethi Rabhi and Guy Laplame. ISBN 0201-59604-0. -- Marcus D. Gabriel, Ph.D. Saint Louis, FRANCE http://www.marcus.gabriel.name mailto:marcus@gabriel.name Tel: +33.3.89.69.05.06 Portable: +33.6.34.56.07.75

On 18/12/2009 18:31, Marcus D. Gabriel wrote:
Hello,
In Control.Parallel.Strategies, parList is defined as
parList strat [] = () parList strat (x:xs) = strat x `par` (parList strat xs)
with
parMap strat f xs = map f xs `using` parList strat.
I have recently found that if I define
forceParMap strat f xs = map f xs `using` forceParList strat
where
forceParList strat = foldl (\done -> (done>||) . strat) ()
then to date, forceParList via forceParMap gives faster results than parList via parMap. For example, in one experiment, parMap with parList run at 0.81 the time of the serial solution whereas forceParMap with forceParList run at 0.58 the time of the serial solution. This is to say, forceParList completed in 0.72 the time of parList. So,
1. Why is forceParList faster than parList? 2. Is this generic to the ghc runtime model or a particularity of the ghc implementation?
I'm not sure. Your forceParList looks equivalent to parList, unless I'm misreading it. I recommend trying out the new parallel package, here: http://hackage.haskell.org/package/parallel which has a new implementation of Strategies. The version of parList you quoted above has a space leak problem which may be affecting your results. Cheers, Simon

Thank you Simon, I will obtain parallel 2.2.0.1 and work with it. Actually, the reason I asked my question was because I did not think forceParList should yield better performance than parList (unless it was becasue of the foldl?). I read the release notes for 6.12.1 about the work done on the ghc parallel framework and did just a little bit more experimenting. You might find this of interesting. ghc 6.10.4 with parallel 1.1.0.1 (as reported before): Serial algorithm: 1.00 unit of time Parallel algorithm with parList: 0.81 units of time Parallel algorithm with forceParList: 0.58 units of time ghc 6.12.1 with parallel 1.1.0.1: Serial algorithm: 1.00 unit of time Parallel algorithm with parList: 0.58 units of time* Parallel algorithm with forceParList: 0.58 units of time* Interesting. Well, from my perspective, 6.12.1 is certainly an improvement here. Cheers, - Marcus Simon Marlow wrote:
On 18/12/2009 18:31, Marcus D. Gabriel wrote:
Hello,
In Control.Parallel.Strategies, parList is defined as
parList strat [] = () parList strat (x:xs) = strat x `par` (parList strat xs)
with
parMap strat f xs = map f xs `using` parList strat.
I have recently found that if I define
forceParMap strat f xs = map f xs `using` forceParList strat
where
forceParList strat = foldl (\done -> (done>||) . strat) ()
then to date, forceParList via forceParMap gives faster results than parList via parMap. For example, in one experiment, parMap with parList run at 0.81 the time of the serial solution whereas forceParMap with forceParList run at 0.58 the time of the serial solution. This is to say, forceParList completed in 0.72 the time of parList. So,
1. Why is forceParList faster than parList? 2. Is this generic to the ghc runtime model or a particularity of the ghc implementation?
I'm not sure. Your forceParList looks equivalent to parList, unless I'm misreading it.
I recommend trying out the new parallel package, here:
http://hackage.haskell.org/package/parallel
which has a new implementation of Strategies. The version of parList you quoted above has a space leak problem which may be affecting your results.
Cheers, Simon

Thanks Simon. Parallel 2.2.0.1 was straight forward. I just replaced rnf with rdeepseq and my original use of parMap worked like a charm giving twice the performance for my dual-core system as I original expected and now find. Thanks, - Marcus Marcus D. Gabriel wrote:
Thank you Simon, I will obtain parallel 2.2.0.1 and work with it. Actually, the reason I asked my question was because I did not think forceParList should yield better performance than parList (unless it was becasue of the foldl?).
I read the release notes for 6.12.1 about the work done on the ghc parallel framework and did just a little bit more experimenting. You might find this of interesting.
ghc 6.10.4 with parallel 1.1.0.1 (as reported before): Serial algorithm: 1.00 unit of time Parallel algorithm with parList: 0.81 units of time Parallel algorithm with forceParList: 0.58 units of time
ghc 6.12.1 with parallel 1.1.0.1: Serial algorithm: 1.00 unit of time Parallel algorithm with parList: 0.58 units of time* Parallel algorithm with forceParList: 0.58 units of time*
Interesting. Well, from my perspective, 6.12.1 is certainly an improvement here.
Cheers, - Marcus
Simon Marlow wrote:
On 18/12/2009 18:31, Marcus D. Gabriel wrote:
Hello,
In Control.Parallel.Strategies, parList is defined as
parList strat [] = () parList strat (x:xs) = strat x `par` (parList strat xs)
with
parMap strat f xs = map f xs `using` parList strat.
I have recently found that if I define
forceParMap strat f xs = map f xs `using` forceParList strat
where
forceParList strat = foldl (\done -> (done>||) . strat) ()
then to date, forceParList via forceParMap gives faster results than parList via parMap. For example, in one experiment, parMap with parList run at 0.81 the time of the serial solution whereas forceParMap with forceParList run at 0.58 the time of the serial solution. This is to say, forceParList completed in 0.72 the time of parList. So,
1. Why is forceParList faster than parList? 2. Is this generic to the ghc runtime model or a particularity of the ghc implementation?
I'm not sure. Your forceParList looks equivalent to parList, unless I'm misreading it.
I recommend trying out the new parallel package, here:
http://hackage.haskell.org/package/parallel
which has a new implementation of Strategies. The version of parList you quoted above has a space leak problem which may be affecting your results.
Cheers, Simon
participants (3)
-
Denis Bueno
-
Marcus D. Gabriel
-
Simon Marlow