
I have a reduction system in which a rule takes a term and returns a set of terms. The reduction system creates a tree that originates at a starting value called the root. For most problems, the reduction system terminates, but a step count limit protects from non-termination. Rule application is expensive, so it is essential that a rule is never applied to the same problem twice. This check makes my program sequential, in that parallel annotations don't improve performance on SMPs. There isn't even an obvious place to add them in my program, at least not to me. How do people write parallel reduction systems that avoid redundant rule application? John
module Main (main) where
import System.Time (ClockTime(..), getClockTime
data Eq a => Item a = Item { item :: a, parent :: Maybe (Item a) }
instance Eq a => Eq (Item a) where x == y = item x == item y
The reduction system takes a rule, a step count, and an initial value, and computes a tree of reductions. The order of the items in the returned list is irrelevant, because the tree is assembled as a post processing step.
reduce :: (Eq a, Monad m) => (a -> [a]) -> Int -> a -> m [Item a] reduce rule limit root = step rule limit [top] [top] where top = Item { item = root, parent = Nothing }
step rule limit seen todo, where seen in the items already seen, and todo is the items on the queue.
step :: (Eq a, Monad m) => (a -> [a]) -> Int -> [Item a] -> [Item a] -> m [Item a] step _ limit _ _ | limit <= 0 = fail "Step limit exceeded" step _ _ seen [] = return seen step rule limit seen (next : rest) = loop seen rest children where children = map child (rule (item next)) child i = Item { item = i, parent = Just next } loop seen rest [] = step rule (limit - 1) seen rest loop seen rest (kid : kids) = if elem kid seen then loop seen rest kids else loop (kid : seen) (rest ++ [kid]) kids
A silly rule
rule :: Int -> [Int] rule n = filter (>= 0) [n - 1, n - 2, n - 3]
secDiff :: ClockTime -> ClockTime -> Float secDiff (TOD secs1 psecs1) (TOD secs2 psecs2) = fromInteger (psecs2 - psecs1) / 1e12 + fromInteger (secs2 - secs1)
main :: IO () main = do t0 <- getClockTime ns <- reduce rule 20000 5000 t1 <- getClockTime putStrLn $ "length: " ++ show (length ns) putStrLn $ "time: " ++ show (secDiff t0 t1) ++ " seconds"
The makefile ------------ PROG = reduce GHCFLAGS = -Wall -fno-warn-name-shadowing -O %: %.lhs ghc $(GHCFLAGS) -o $@ $< all: $(PROG) clean: -rm *.o *.hi $(PROG)

I spent four hours investigating this problem! Thank you very much for the
excellent brainfood, and challenging Haskell's claim to be rawkin' at
parallelism. I think, though it took much experimentation, that I have
confirmed that it is :-)
On Sun, Feb 1, 2009 at 9:26 PM, John D. Ramsdell
I have a reduction system in which a rule takes a term and returns a set of terms. The reduction system creates a tree that originates at a starting value called the root. For most problems, the reduction system terminates, but a step count limit protects from non-termination.
That's typically a bad idea. Instead, use laziness to protect from nontermination. For example, in this case, we can output a collection of items lazily, and then take a finite amount of the output (or check whether the output is longer than some length), without having to evaluate all of it. Here's my writeup of my solution, in literate Haskell. It doesn't output the exact same structure as yours, but hopefully you can see how to tweak it to do so.
{-# LANGUAGE RankNTypes #-}
*import* *qualified* Data.MemoCombinators *as* Memo *import* *qualified* Data.Set *as* Set *import* Control.Parallel (par) *import* *qualified* Control.Parallel.Strategies *as* Par *import* Data.Monoid (Monoid(..)) *import* Control.Monad.State *import* *qualified* Data.DList *as* DList *import* Debug.Trace *import* Control.Concurrent
First, I want to capture the idea of a generative set like you're doing. GenSet is like a set, with the constructor "genset x xs" which says "if x is in the set, then so are xs". I'll represent it as a stateful computation of the list of things we've seen so far, returning the list of things we've seen so far. It's redundant information, but sets can't be consumed lazily, thus the list (the set will follow along lazily :-). Remember that State s a is just the function (s -> (s,a)). So we're taking the set of things we've seen so far, and returning the new elements added and the set unioned with those elements.
*newtype* GenSet a = GenSet (State (Set.Set a) (DList.DList a))
genset :: (Ord a) => a -> GenSet a -> GenSet a genset x (GenSet f) = GenSet $ *do* seen <- gets (x `Set.member`) *if* seen *then* return mempty *else* fmap (DList.cons x) $ modify (Set.insert x) >> f
toList :: GenSet a -> [a] toList (GenSet f) = DList.toList $ evalState f Set.empty
GenSet is a monoid, where mappend is just union.
*instance* (Ord a) => Monoid (GenSet a) *where* mempty = GenSet (return mempty) mappend (GenSet a) (GenSet b) = GenSet (liftM2 mappend a b)
Okay, so that's how we avoid exponential behavior when traversing the tree. We can now just toss around GenSets like they're sets and everything will be peachy. Here's the heart of the algorithm: the reduce function. To avoid recomputation of rules, we could just memoize the rule function. But we'll do something a little more clever. The function we'll memoize ("parf") first sparks a thread computing its *last* child. Because the search is depth-first, it will typically be a while until we get to the last one, so we benefit from the spark (you don't want to spark a thread computing something you're about to compute anyway).
reduce :: (Ord a) => Memo.Memo a -> (a -> [a]) -> a -> [a] reduce memo f x = toList (makeSet x) *where* makeSet x = genset x . mconcat . map makeSet . f' $ x f' = memo parf parf a = *let* ch = f a *in* ch `seq` (f' (last ch) `par` ch)
The ch `seq` is there so that the evaluation of ch and last ch aren't competing with each other. Your example had a few problems. You said the rule was supposed to be expensive, but yours was cheap. Also, [x-1,x-2,x-3] are all very near each other, so it's hard to go do unrelated stuff. I made a fake expensive function before computing the neighbors, and tossed around some prime numbers to scatter the space more.
rule :: Int -> [Int] rule n = expensive `seq` [next 311 4, next 109 577, next 919 353] *where* next x y = (x * n + y) `mod` 5000 expensive = sum [1..50*n]
main :: IO () main = *do* *let* r = reduce Memo.integral rule 1 print (length r)
The results are quite promising: % ghc --make -O2 rules2 -threaded % time ./rules2 5000 ./rules2 13.25s user 0.08s system 99% cpu 13.396 total % time ./rules2 +RTS -N2 5000 ./rules2 +RTS -N2 12.52s user 0.30s system 159% cpu 8.015 total That's 40% decrease in running time! Woot! I'd love to see what it does on a machine with more than 2 cores. Enjoy! Luke

On Mon, Feb 2, 2009 at 2:15 AM, Luke Palmer
I spent four hours investigating this problem! Thank you very much for the excellent brainfood, and challenging Haskell's claim to be rawkin' at parallelism. I think, though it took much experimentation, that I have confirmed that it is :-)
For those of you who don't like reading ugly gmail-mangled html, I posted my solution here: http://lukepalmer.wordpress.com/2009/02/02/parallel-rewrite-system/

On Sun, Feb 1, 2009 at 9:26 PM, John D. Ramsdell
wrote: I have a reduction system in which a rule takes a term and returns a set of terms. The reduction system creates a tree that originates at a starting value called the root. For most problems, the reduction system terminates, but a step count limit protects from non-termination.
That's typically a bad idea. Instead, use laziness to protect from nontermination. For example, in this case, we can output a collection of items lazily, and then take a finite amount of the output (or check whether the output is longer than some length), without having to evaluate all of it.
Very good suggestion. In my code, I should "take limit" on the generated list, and fail if the length of the list is limit. Sounds easy. I'll study your parallel solution tonight after work. Thank you. Here is an interesting fact about my term reduction system. The binary relation that defines reduction is slightly different from the usual. It's a relation between terms and sets of terms. Furthermore, some normal forms can be identified as answers, and some normal forms are dead ends. When applying a rule, it doesn't matter which set in the relation is used as the result. The answer normal forms will all be the same. If the rule produces the empty set for some choice, all other choices will lead only to normal forms that are dead ends. John

Luke, I read your solution but didn't understand how it applies to my problem. I must not have explained the problem well enough. Let me try again. I have a reduction system in which a rule takes a term and returns a set of terms. The terms can be compared for equality, but they are not ordered. The reduction system creates a tree that originates at a starting value called the root using breadth first search. For most problems, the reduction system terminates, but a step count limit protects from non-termination. Rule application is expensive, so it is essential that a rule is never applied to the same problem twice. This check makes my program sequential, and I could find no place to add parallel annotations that might help. The key function in the enclosed code is the step function. It's where all the time is spent. To allow effective use of more than one core, doesn't one have to figure out how to allow parts of the seen list to be searched simultaneously? Isn't that search the performance bottleneck? John
module Main (main) where
import System.Time (ClockTime(..), getClockTime) import Data.Tree (Tree(..), flatten) import Data.Maybe (isNothing)
The reduction system takes a rule, a step count, and an initial value, and computes a tree of reductions.
reduce :: (Eq a, Monad m) => (a -> [a]) -> Int -> a -> m (Tree a) reduce rule limit root = step rule limit [top] [top] where top = Item { item = root, parent = Nothing }
The Item data structure stores the information about a reduction step in a form that can be used to construct the final answer as a tree.
data Eq a => Item a = Item { item :: a, parent :: Maybe (Item a) }
instance Eq a => Eq (Item a) where x == y = item x == item y
The step function is where nearly all of the time is used. It is called as: step rule limit seen todo where seen is the items already seen, and todo is the items on the queue. The order of the items in the seen list is irrelevant, because the tree is assembled as a post processing step.
step :: (Eq a, Monad m) => (a -> [a]) -> Int -> [Item a] -> [Item a] -> m (Tree a) step _ limit _ _ | limit <= 0 = fail "Step limit exceeded" step _ _ seen [] = tree seen step rule limit seen (next : rest) = loop seen rest children where children = map child (rule (item next)) child i = Item { item = i, parent = Just next } loop seen rest [] = step rule (limit - 1) seen rest loop seen rest (kid : kids) = if elem kid seen then loop seen rest kids else loop (kid : seen) (rest ++ [kid]) kids
The next two functions assemble the answer into a tree. Sequential search is just fine for tree building.
tree :: (Eq a, Monad m) => [Item a] -> m (Tree a) tree items = case filter (isNothing . parent) items of [root] -> return (build items (item root)) _ -> fail "bad tree"
build :: Eq a => [Item a] -> a -> Tree a build items root = Node { rootLabel = root, subForest = map (build items . item) children } where children = filter child items child i = maybe False ((== root) . item) (parent i)
A silly rule
rule :: Int -> [Int] rule n = filter (>= 0) [n - 1, n - 2, n - 3]
secDiff :: ClockTime -> ClockTime -> Float secDiff (TOD secs1 psecs1) (TOD secs2 psecs2) = fromInteger (psecs2 - psecs1) / 1e12 + fromInteger (secs2 - secs1)
main :: IO () main = do t0 <- getClockTime t <- reduce rule 20000 5000 let ns = length (flatten t) t1 <- getClockTime putStrLn $ "length: " ++ show ns putStrLn $ "time: " ++ show (secDiff t0 t1) ++ " seconds"
The makefile ------------ PROG = reduce GHCFLAGS = -Wall -package containers -fno-warn-name-shadowing -O %: %.lhs ghc $(GHCFLAGS) -o $@ $< all: $(PROG) clean: -rm *.o *.hi $(PROG)
participants (2)
-
John D. Ramsdell
-
Luke Palmer