
Hi George, George Giorgidze wrote:
I would like to announce the first release of the set-monad library.
On Hackage: http://hackage.haskell.org/package/set-monad
Very cool. Seems to work fine. But I am wondering about the impact of using your package on asymptotic complexity (and thereby, on performance). From your implementation:
data Set a where Prim :: (Ord a) => S.Set a -> Set a [...] Zero :: Set a Plus :: Set a -> Set a -> Set a
I notice that this part of your datatype looks like a binary tree of sets. Lets see how your run function treats it:
run :: (Ord a) => Set a -> S.Set a run (Prim s) = s [...] run (Zero) = S.empty run (Plus ma mb) = S.union (run ma) (run mb)
As expected, the Prim-Zero-Plus structure is treated as a binary tree of sets, and run collects all the sets together. That is probably fine, because it should have the same complexity as combining these sets in the first place.
run (Bind (Prim s) f) = S.foldl' S.union S.empty (S.map (run . f) s) [...] run (Bind Zero _) = S.empty run (Bind (Plus ma mb) f) = run (Plus (Bind ma f) (Bind mb f)) [...]
But when I use the binary tree of sets on the left-hand side of a bind, your run function has to naively traverse the whole tree. So if the same elements are included in many sets in the tree of sets, these elements will be processed more than once, so the overall complexity is worse than necessary. Here's a ghci session that seems to confirm my suspicion. I first define a function using set-monad's convenient monad instance for sets:
$ :m +Control.Monad Data.Set.Monad $ let s1 `times` s2 = do {e1 <- s1; e2 <- s2; return (e1, e2)}
Now I produce some test data:
$ let numbers = fromList [1 .. 1000] $ let unioned = numbers `union` numbers $ let mplused = numbers `mplus` numbers
Note that these three sets are all equivalent.
$ (numbers == unioned, numbers == mplused, unioned == mplused) (True, True, True)
However, they behave differently when passed to the times function above. numbers and unioned are similarly "fast":
$ :set +s $ size $ numbers `times` numbers 1000000 (2.56 secs, 1315452984 bytes)
$ size $ unioned `times` unioned (2.39 secs, 1314950600 bytes) 1000000
(Why is unioned faster then numbers? Is union doing some rebalancing? Can I trigger that effect directly?) But mplused is much slower:
$ size $ mplused `times` mplused 1000000 (10.83 secs, 5324731444 bytes)
I suspect that I can achieve similar results by using the list monad and converting to a set in the very end. In what situations can I benefit from set-monad? Tillmann