parallel package, rts ROOT vs WEAK policy and reachable sparks

Dear list This mail is long, but only the 3 last paragraphs are questions, the rest is context. Recently started reading the history of the parallel package. On the changelog of version 2.2 says (I quote) "This module has been rewritten in version 2. The main change is to the 'Strategy a' type synonym, which was previously a -> Done and is now a -> Eval a. This change helps to fix the space leak described in "Runtime Support for Multicore Haskell". The problem is that the runtime will currently retain the memory referenced by all sparks, until they are evaluated. Hence, we must arrange to evaluate all the sparks eventually, just in case they aren't evaluated in parallel, so that they don't cause a space leak. This is why we must return a "new" value after applying a Strategy, so that the application can evaluate each spark created by the Strategy." I know where are currently on version 3.3. Yet I decided to research a little more on the paper cited. On it is discussed the issue of policy for when to GC sparks. Around page 7 of the paper (I quote) ROOT: Treat (non-fizzled) sparks as roots for the garbage col- lector. That is, retain all such sparks and the graph they point to. WEAK: Only retain (non-fizzled) sparks that are reachable from the roots of the program. The problem is, neither of these policies is satisfactory. WEAK seems attractive, because it lets us discard sparks that are no longer required by the program. However, the WEAK policy is completely incompatible with strategies. Consider the parList strategy: parList :: Strategy a -> Strategy [a] parList strat [] = done parList strat (x:xs) = strat x ‘par‘ parList strat xs Each spark generated by parList is a thunk for the expression “ strat x ”; this thunk is not shared, since it is created uniquely for the purposes of creating the spark, and hence can never fizzle. Hence, the WEAK policy will discard all sparks created by parList , which is obviously undesirable(*). Question 1: Under WEAK policy, On (*) I understand the sparks cannot be fizzle'd, but how are the parList spark not reachable from the roots of the program? surely this function doesn't exist on its own. Somewhere in the program I must be a function that is executed that has an let val = ... `using` parList ... in ... val ... which would make the sparks reachable. Making it a *good* policy to have IMHO. This probably has to do with my understanding of reachable sparks. Question 2: The ROOT policy is discuted to be better, because it supports parList (and old parallel library versions). Yet, the old docs of parallel recomend to that "if you are gonna spark, make sure you use the result to avoid space leaks" ie filter you data before and only spark on that. Is this intuition correct? Question 3: What is the current situation of this? have weak pointers helped in this regard? -- Ruben

Hi Ruben, Question 1: In your example, the sparks produced by parList are in fact not reachable. Note that as long as you don't evaluate val, there are no sparks at all: the expression (myList `using` parList strat) remains a thunk. Once you evaluate it, (strat x) is sparked for each element x, but no other reference to them is kept, since a strategy returns a unit. Question 2: If you can "filter" the values that you will actually need without evaluating them, that would be the right idea. Question 3: Take a look at the other paper linked in the docs Seq no More: Better Strategies for Parallel Haskell http://community.haskell.org/~simonmar/papers/strategies.pdf It describes the approach taken since version 2 which solves this issue of garbage collecting sparks. It also goes in a bit more detail about the problems with the original approach at the start of section 3. Regards, Li-yao On 05/17/2017 01:40 AM, Ruben Astudillo wrote:
Dear list
This mail is long, but only the 3 last paragraphs are questions, the rest is context. Recently started reading the history of the parallel package. On the changelog of version 2.2 says (I quote)
"This module has been rewritten in version 2. The main change is to the 'Strategy a' type synonym, which was previously a -> Done and is now a -> Eval a. This change helps to fix the space leak described in "Runtime Support for Multicore Haskell". The problem is that the runtime will currently retain the memory referenced by all sparks, until they are evaluated. Hence, we must arrange to evaluate all the sparks eventually, just in case they aren't evaluated in parallel, so that they don't cause a space leak. This is why we must return a "new" value after applying a Strategy, so that the application can evaluate each spark created by the Strategy."
I know where are currently on version 3.3. Yet I decided to research a little more on the paper cited. On it is discussed the issue of policy for when to GC sparks. Around page 7 of the paper (I quote)
ROOT: Treat (non-fizzled) sparks as roots for the garbage col- lector. That is, retain all such sparks and the graph they point to.
WEAK: Only retain (non-fizzled) sparks that are reachable from the roots of the program.
The problem is, neither of these policies is satisfactory. WEAK seems attractive, because it lets us discard sparks that are no longer required by the program. However, the WEAK policy is completely incompatible with strategies. Consider the parList strategy:
parList :: Strategy a -> Strategy [a] parList strat [] = done parList strat (x:xs) = strat x ‘par‘ parList strat xs
Each spark generated by parList is a thunk for the expression “ strat x ”; this thunk is not shared, since it is created uniquely for the purposes of creating the spark, and hence can never fizzle. Hence, the WEAK policy will discard all sparks created by parList , which is obviously undesirable(*).
Question 1: Under WEAK policy, On (*) I understand the sparks cannot be fizzle'd, but how are the parList spark not reachable from the roots of the program? surely this function doesn't exist on its own. Somewhere in the program I must be a function that is executed that has an
let val = ... `using` parList ... in ... val ...
which would make the sparks reachable. Making it a *good* policy to have IMHO. This probably has to do with my understanding of reachable sparks.
Question 2: The ROOT policy is discuted to be better, because it supports parList (and old parallel library versions). Yet, the old docs of parallel recomend to that "if you are gonna spark, make sure you use the result to avoid space leaks" ie filter you data before and only spark on that. Is this intuition correct?
Question 3: What is the current situation of this? have weak pointers helped in this regard?
-- Ruben _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
participants (2)
-
Li-yao Xia
-
Ruben Astudillo