Parallelism and expensive calculations that may not be needed

Hello everybody! (Sorry if this is a newbie question, couldn't find the answer anywhere) Suppose I have an expensive function (such that even to be reduced to WHNF it takes a long processing time)
expensive :: Foo -> Maybe Bar
and I want to calculate it on multiple processors, like in
calculate = parMap rnf expensive
Well, fine. But suppose that then I give the result to a function like
final xs = reverse $ final' xs [] where final' [] r = r final' (x:xs) r = case x of Nothing -> [] Just x -> final' xs (x:r)
Given list :: [Foo], a sequential calculation as in
resultS = final $ map expensive list
would not call expensive after finding a Nothing. But
resultP = final $ calculate list
will keep calculating all the other expensive's because they were sparked? Also, suppose that in list there are thousands of elements, whereas the last one is the only x such as "expensive x == Nothing". More importantly, calculating "expensive x" is very fast for some reason. In both versions above, all the initial elements will have to be calculated in order to find that the last one (that probably was already computed in the parallel version) was Nothing and that all the work must have to be thrown away. Are the observations above valid? Is there a way to improve this style of computation? Any pointers are appreciated =). I think that to solve the second problem, I needed some sort of fold that is aware of parallel nature of the list and that folds on the order in that the thunks get evaluated, and not on order of the list. But I guess it's not possible to write something like this, right? Cheers, Felipe. -- Felipe.

Felipe Almeida Lessa wrote:
(Sorry if this is a newbie question, couldn't find the answer anywhere)
Suppose I have an expensive function (such that even to be reduced to WHNF it takes a long processing time)
expensive :: Foo -> Maybe Bar
and I want to calculate it on multiple processors, like in
calculate = parMap rnf expensive
Well, fine. But suppose that then I give the result to a function like
final xs = reverse $ final' xs [] where final' [] r = r final' (x:xs) r = case x of Nothing -> [] Just x -> final' xs (x:r)
Given list :: [Foo], a sequential calculation as in
resultS = final $ map expensive list
would not call expensive after finding a Nothing. But
resultP = final $ calculate list
will keep calculating all the other expensive's because they were sparked?
Right. This is an example of speculation: you don't know in advance how much work you really have to do, so any work you do in parallel may or may not be necessary. If we fire off the whole list in parallel, we might end up doing a lot of work on the elements at the end of the list, which are less likely to be required than the elements at the front of the list. That's a bad strategy: we need to prioritise the elements near the front. Fortunately GHC's built-in strategy for picking sparks to evaluate picks the oldest ones first, so this shouldn't be a problem. The problem occurs when you've found the Nothing, but the rest of the list has already been sparked. You really want to throw away all those sparks, but there's no way to do that currently. One way you could improve the situation though is to only spark the next N elements in the list, limiting the amount of speculation. Hope this helps... Cheers, Simon

On 6/8/07, Simon Marlow
The problem occurs when you've found the Nothing, but the rest of the list has already been sparked. You really want to throw away all those sparks, but there's no way to do that currently. One way you could improve the situation though is to only spark the next N elements in the list, limiting the amount of speculation.
I see. I was hoping that GHC would see that the sparked thunk could be garbage collected (as the rest of the list with sparks was thrown away) and would not force it. It is expensive to GC before starting every spark, of course, but maybe the GC could look for sparks that could be removed whenever it collects?
Hope this helps...
Thanks! It sure helps! Cheers, -- Felipe.

Felipe Almeida Lessa wrote:
On 6/8/07, Simon Marlow
wrote: The problem occurs when you've found the Nothing, but the rest of the list has already been sparked. You really want to throw away all those sparks, but there's no way to do that currently. One way you could improve the situation though is to only spark the next N elements in the list, limiting the amount of speculation.
I see. I was hoping that GHC would see that the sparked thunk could be garbage collected (as the rest of the list with sparks was thrown away) and would not force it. It is expensive to GC before starting every spark, of course, but maybe the GC could look for sparks that could be removed whenever it collects?
Actually at the moment the GC treats sparks as roots, but that's wrong. Thanks for reminding me, I must fix that sometime. Cheers, Simon
participants (2)
-
Felipe Almeida Lessa
-
Simon Marlow