
On Sat, 2009-12-12 at 13:46 +0000, Ben Millwood wrote:
On Sat, Dec 12, 2009 at 10:08 AM, Maciej Piechotka
wrote: If operation is associative it can be done using divide et impera spliting list in half and operating on it pararerlly then split in half etc.
Thank you very much for the replies. I've come to the conclusion that, yep, you can't (directly) parallelise of fold operation, as fold guarantees order of processing. With something like map the runtime is free to continue sparking function application to each element without waiting for the result. So we spark f x, force evaluation of the remainder of the xs and recurse. I'm *guessing* at a detailed level when we are creating the output list, haskell can concat each result element before f x returns due to laziness - that is, haskell doesn't need to wait for evaluation of f x, before continuing? With fold, and specifically with foldl (+), this isn't the case as (+) is strict on both arguments and thus it cannot continue until each sparked evaluation has completed and combined with the accumulator. If (+) was not strict on both arguments, I'm not sure if could solider on... assuming I've understood map correctly!? Writing it out long hand (sorry if this is tedious!), we have: using :: a -> Strategy a -> a using x s = s x `seq` x rwhnf :: Strategy a rwhnf x = x `seq` () parList :: Strategy a -> Strategy [a] parList strat [] = () parList strat (x:xs) = strat x `par` (parList strat xs) parMap :: Strategy b -> (a -> b) -> [a] -> [b] parMap strat f = (`using` parList strat) . map f 'using' applies a strategy to an item, and then returns the item. 'parList' is a (combinator) strategy which applies an atomic strategy to each element in the list *in parallel* (for example forcing each element to WHNF). So for parMap we have xs passed into 'map f' - the result is then passed to 'using' which will force application of 'f' on each element in parallel by way of 'parList'. No forced evaluation is dependant on a previous evaluation. Now for parFoldl - a crude and wrong representation for my purposes could be: parFoldl :: Num b => Strategy b -> (a -> b) -> [a] -> b parFoldl strat f = sum . (`using` parList strat) . map f This isn't really a fold of course, but it is doing roughly the same thing, it's summing the results of applying function 'f' to each element in a list. The problem here is that sum will only allow one spark at a time, because sum [] = 0 sum (x:xs) = x + sum xs So we get something like: 0 + (x4 + (x3 + (x2 + (x1)))) For example the result for (x4 + previous) can only be evaluated after x3, x2 and x1 have been evaluated. This means it won't spark evaluation on x4 until (x3 + ....) has been evaluated, thus only one core is ever used. I believe fold is just the general case of sum and the same logic applies. I suppose my questions are: Have I got this right, if not very succinct!? Is it purely the strictness of (+) that causes this situation? Ignoring DPH, is it possible to write a parallel fold avoiding something like the technique below? Anyhow, a workaround similar to those suggested I came up with is to divide the folds up across the cores and then sum the sub-folds - this produces approximately double the performance across two cores: import Control.Parallel.Strategies (parMap,rwhnf) import Data.List (foldl') import Data.List.Split (chunk) import GHC.Conc (numCapabilities) -- Prepare to share work to be -- done across available cores chunkOnCpu :: [a] -> [[a]] chunkOnCpu xs = chunk (length xs `div` numCapabilities) xs -- Spark a fold of each chunk and -- sum the results. Only works because -- for associative folds. foldChunks :: ([a] -> a) -> (a -> b -> a) -> a -> [[b]] -> a foldChunks combineFunc foldFunc acc = combineFunc . (parMap rwhnf $ foldl' foldFunc acc) -- Some pointless work to keep thread busy workFunc :: Int -> Int workFunc 1 = 1 workFunc x = workFunc $ x - 1 -- Do some work on element x and append foldFunc :: Int -> Int -> Int foldFunc acc x = acc + workFunc x testList = repeat 1000000000 answer = foldChunks sum foldFunc 0 $ chunkOnCpu (take 50 testList) main :: IO() main = print answer