
I would like to announce the first release of the set-monad library. On Hackage: http://hackage.haskell.org/package/set-monad The set-monad library exports the Set abstract data type and set-manipulating functions. These functions behave exactly as their namesakes from the Data.Set module of the containers library. In addition, the set-monad library extends Data.Set by providing Functor, Applicative, Alternative, Monad, and MonadPlus instances for sets. In other words, you can use the set-monad library as a drop-in replacement for the Data.Set module of the containers library and, in addition, you will also get the aforementioned instances which are not available in the containers package. It is not possible to directly implement instances for the aforementioned standard Haskell type classes for the Set data type from the containers library. This is because the key operations map and union, are constrained with Ord as follows. map :: (Ord a, Ord b) => (a -> b) -> Set a -> Set b union :: (Ord a) => Set a -> Set a -> Set a The set-monad library provides the type class instances by wrapping the constrained Set type into a data type that has unconstrained constructors corresponding to monadic combinators. The data type constructors that represent monadic combinators are evaluated with a constrained run function. This elevates the need to use the constraints in the instance definitions (this is what prevents a direct definition). The wrapping and unwrapping happens internally in the library and does not affect its interface. For details, see the rather compact definitions of the run function and type class instances. The left identity and associativity monad laws play a crucial role in the definition of the run function. The rest of the code should be self explanatory. The technique is not new. This library was inspired by [1]. To my knowledge, the original, systematic presentation of the idea to represent monadic combinators as data is given in [2]. There is also a Haskell library that provides a generic infrastructure for the aforementioned wrapping and unwrapping [3]. The set-monad library is particularly useful for writing set-oriented code using the do and/or monad comprehension notations. For example, the following definitions now type check. s1 :: Set (Int,Int) s1 = do a <- fromList [1 .. 4] b <- fromList [1 .. 4] return (a,b) -- with -XMonadComprehensions s2 :: Set (Int,Int) s2 = [ (a,b) | (a,b) <- s1, even a, even b ] s3 :: Set Int s3 = fmap (+1) (fromList [1 .. 4]) As noted in [1], the implementation technique can be used for monadic libraries and EDSLs with restricted types (compiled EDSLs often restrict the types that they can handle). Haskell's standard monad type class can be used for restricted monad instances. There is no need to resort to GHC extensions that rebind the standard monadic combinators with the library or EDSL specific ones. [1] CSDL Blog: The home of applied functional programming at KU. Monad Reification in Haskell and the Sunroof Javascript compiler. http://www.ittc.ku.edu/csdlblog/?p=88 [2] Chuan-kai Lin. 2006. Programming monads operationally with Unimo. In Proceedings of the eleventh ACM SIGPLAN International Conference on Functional Programming (ICFP '06). ACM. [3] Heinrich Apfelmus. The operational package. http://hackage.haskell.org/package/operational

Hi George, George Giorgidze wrote:
I would like to announce the first release of the set-monad library.
On Hackage: http://hackage.haskell.org/package/set-monad
Very cool. Seems to work fine. But I am wondering about the impact of using your package on asymptotic complexity (and thereby, on performance). From your implementation:
data Set a where Prim :: (Ord a) => S.Set a -> Set a [...] Zero :: Set a Plus :: Set a -> Set a -> Set a
I notice that this part of your datatype looks like a binary tree of sets. Lets see how your run function treats it:
run :: (Ord a) => Set a -> S.Set a run (Prim s) = s [...] run (Zero) = S.empty run (Plus ma mb) = S.union (run ma) (run mb)
As expected, the Prim-Zero-Plus structure is treated as a binary tree of sets, and run collects all the sets together. That is probably fine, because it should have the same complexity as combining these sets in the first place.
run (Bind (Prim s) f) = S.foldl' S.union S.empty (S.map (run . f) s) [...] run (Bind Zero _) = S.empty run (Bind (Plus ma mb) f) = run (Plus (Bind ma f) (Bind mb f)) [...]
But when I use the binary tree of sets on the left-hand side of a bind, your run function has to naively traverse the whole tree. So if the same elements are included in many sets in the tree of sets, these elements will be processed more than once, so the overall complexity is worse than necessary. Here's a ghci session that seems to confirm my suspicion. I first define a function using set-monad's convenient monad instance for sets:
$ :m +Control.Monad Data.Set.Monad $ let s1 `times` s2 = do {e1 <- s1; e2 <- s2; return (e1, e2)}
Now I produce some test data:
$ let numbers = fromList [1 .. 1000] $ let unioned = numbers `union` numbers $ let mplused = numbers `mplus` numbers
Note that these three sets are all equivalent.
$ (numbers == unioned, numbers == mplused, unioned == mplused) (True, True, True)
However, they behave differently when passed to the times function above. numbers and unioned are similarly "fast":
$ :set +s $ size $ numbers `times` numbers 1000000 (2.56 secs, 1315452984 bytes)
$ size $ unioned `times` unioned (2.39 secs, 1314950600 bytes) 1000000
(Why is unioned faster then numbers? Is union doing some rebalancing? Can I trigger that effect directly?) But mplused is much slower:
$ size $ mplused `times` mplused 1000000 (10.83 secs, 5324731444 bytes)
I suspect that I can achieve similar results by using the list monad and converting to a set in the very end. In what situations can I benefit from set-monad? Tillmann

On Sat, Jun 16, 2012 at 3:57 AM, Tillmann Rendel
George Giorgidze wrote:
I would like to announce the first release of the set-monad library.
On Hackage: http://hackage.haskell.org/package/set-monad
Very cool. Seems to work fine. But I am wondering about the impact of using your package on asymptotic complexity (and thereby, on performance).
For programs using only the Monad/MonadPlus interface, I would expect it to have the same asymptotic complexity as [] or Cont (S.Set a). As you noticed, you can get somewhat better performance by using the combinators that convert to S.Set internally, because they eliminate redundant computations later on.
(Why is unioned faster then numbers? Is union doing some rebalancing? Can I trigger that effect directly?)
It's because mplus a b >>= f turns into mplus (a >>= f) (b >>= f),
whereas unioned takes the union before calling f.
You can force this by defining:
simplify :: (Ord a) => Set a -> Set a
simplify = Prim . run
Unfortunately, there doesn't seem to be any equivalent of Prim in the
exported interface. I guess doing simplify = union empty would work.
--
Dave Menendez

Hi, David Menendez wrote:
As you noticed, you can get somewhat better performance by using the combinators that convert to S.Set internally, because they eliminate redundant computations later on.
"Somewhat better"? My example was three times faster, and I guess that the fast variant is O(n) and the slow variant is O(n²). So I expect that for some applications, the Set interface is more than fast enough and the MonadPlus-interface is much too slow.
(Why is unioned faster then numbers? Is union doing some rebalancing? Can I trigger that effect directly?)
It's because mplus a b>>= f turns into mplus (a>>= f) (b>>= f), whereas unioned takes the union before calling f.
Here, you compare mplused to unioned, but in the parentheses, I asked about numbers and unioned (from my last email). Tillmann

On Sun, Jun 17, 2012 at 2:26 AM, Tillmann Rendel
Hi,
David Menendez wrote:
As you noticed, you can get somewhat better performance by using the combinators that convert to S.Set internally, because they eliminate redundant computations later on.
"Somewhat better"? My example was three times faster, and I guess that the fast variant is O(n) and the slow variant is O(n²). So I expect that for some applications, the Set interface is more than fast enough and the MonadPlus-interface is much too slow.
Yes, that should have been "significantly better".
(Why is unioned faster then numbers? Is union doing some rebalancing? Can I trigger that effect directly?)
It's because mplus a b>>= f turns into mplus (a>>= f) (b>>= f), whereas unioned takes the union before calling f.
Here, you compare mplused to unioned, but in the parentheses, I asked about numbers and unioned (from my last email).
You're right. That may have been caused by the time to compute numbers
itself; I saw that numbers `times` numbers was faster than unioned
`times` unioned the second time I ran it.
Additionally, I haven't done any serious performance testing, but
there also seems to be a speedup when the following lines are added to
run:
run (Bind (Plus (Prim sa) mb) f) = run (Bind (S.union sa (run mb)) f)
run (Bind (Plus ma (Prim sb)) f) = run (Bind (S.union (run ma) sb) f)
--
Dave Menendez

Hi Tillmann, Thanks for your interesting question regarding the performance overheads of the Data.Set.Monad wrapper compared to the original Data.Set library. If you use set-specific functions there will not be any difference in asymptotic complexity between Data.Set.Monad and Data.Set. In terms of raw performance, there will be a tiny overhead when unpacking sets from the Prim constructor. But this should be negligible, as set-specific operations performed using Data.Set under the hood will likely dominate the runtime. Now, how does the runtime complexity of overloaded functions from the Functor, Monad and other supported type classes compared to the set-specific ones? You are right that there is a problem when mplus is composed with >>=. I have not done a proper investigation and benchmarking, but I think (and hope) this should be the only problematic case when it comes to runtime complexity. Luckily, there is an easy workaround (see below). Let us take a detailed look at this case: run (Bind (Plus ma mb) f) = run (Plus (Bind ma f) (Bind mb f)) Should ma mb be equal, this will run the function f twice (unnecessarily). So does this mean that there is only the factor of 2 slowdown compared to the case where the sets are combined before the bind. Unfortunately, the answer is no, ma and mb could be constructed with Plus as well. In the worst case the slowdown is k, where k is the number equal sets in the tree of Pluses (in the worst case all sets in the tree are equal) in the left hand side of the bind. The key to the approach used in set-monad is to make progress with the evaluation of the unconstrained constructors (i.e., Return, Bind, Zero and Plus) without using constrained set-specific operations. It turns out that for several cases one can progress with the evaluation without duplicating f (evaluation relies on monoid laws, Plus is associative and Zero is left and right identity of Plus). I have pushed those optimisations to the repo. These optimisations also cover your example. But there is one case for which I have not found a way to progress with the evaluation without duplicating f: run (Bind (Plus (Bind _ _) (Bind _ _)) f) For this case, I am still duplicating f. I will have a look whether it is possible to avoid duplication. Luckily, the aforementioned limitation can be avoided by using the mappend function from the Monoid type class instead of mplus from the MonadPlus type class. Also the desugaring of the do and monad comprehension notations do not produce calls to mplus. In your email, you have also asked:
I suspect that I can achieve similar results by using the list monad and converting to a set in the very end. In what situations can I benefit from set-monad?
Sometimes set is a more appropriate data structure than list. That is
why the modules like Data.Set and Data.HashSet exist in the first
place. Different operations on sets and lists may have different
runtime complexities. For example, if your problem requires frequent
checking of whether an element exists in a collection then set may be
more appropriate than list. This is just one example.
More generally, if the programmer does not care about order or does
not want duplicates in a collection, again set seems to be more
appropriate than list. If the programmer uses lists in this scenario,
she needs to explicitly eliminate duplicates. In addition to being
inconvenient (if the programmer has to do it frequently), duplicate
elimination for lists is an expensive operation.
The set-monad library itself (by exporting the Monad instance for
sets) could also have an educational value, by allowing the
comprehension notation to be used for sets (and not just for lists).
This recreates the original set comprehension notation from
mathematics in Haskell. Math definitions that use comprehensions can
be directly transcribed to Haskell. It could also be useful to
highlight conceptual differences between lists and sets in Haskell
without different notations getting in the way.
Cheers, George
On 16 June 2012 09:57, Tillmann Rendel
Hi George,
George Giorgidze wrote:
I would like to announce the first release of the set-monad library.
On Hackage: http://hackage.haskell.org/package/set-monad
Very cool. Seems to work fine. But I am wondering about the impact of using your package on asymptotic complexity (and thereby, on performance).
From your implementation:
data Set a where Prim :: (Ord a) => S.Set a -> Set a [...] Zero :: Set a Plus :: Set a -> Set a -> Set a
I notice that this part of your datatype looks like a binary tree of sets. Lets see how your run function treats it:
run :: (Ord a) => Set a -> S.Set a run (Prim s) = s [...] run (Zero) = S.empty run (Plus ma mb) = S.union (run ma) (run mb)
As expected, the Prim-Zero-Plus structure is treated as a binary tree of sets, and run collects all the sets together. That is probably fine, because it should have the same complexity as combining these sets in the first place.
run (Bind (Prim s) f) = S.foldl' S.union S.empty (S.map (run . f) s) [...] run (Bind Zero _) = S.empty run (Bind (Plus ma mb) f) = run (Plus (Bind ma f) (Bind mb f)) [...]
But when I use the binary tree of sets on the left-hand side of a bind, your run function has to naively traverse the whole tree. So if the same elements are included in many sets in the tree of sets, these elements will be processed more than once, so the overall complexity is worse than necessary.
Here's a ghci session that seems to confirm my suspicion. I first define a function using set-monad's convenient monad instance for sets:
$ :m +Control.Monad Data.Set.Monad $ let s1 `times` s2 = do {e1 <- s1; e2 <- s2; return (e1, e2)}
Now I produce some test data:
$ let numbers = fromList [1 .. 1000] $ let unioned = numbers `union` numbers $ let mplused = numbers `mplus` numbers
Note that these three sets are all equivalent.
$ (numbers == unioned, numbers == mplused, unioned == mplused) (True, True, True)
However, they behave differently when passed to the times function above. numbers and unioned are similarly "fast":
$ :set +s $ size $ numbers `times` numbers 1000000 (2.56 secs, 1315452984 bytes)
$ size $ unioned `times` unioned (2.39 secs, 1314950600 bytes) 1000000
(Why is unioned faster then numbers? Is union doing some rebalancing? Can I trigger that effect directly?)
But mplused is much slower:
$ size $ mplused `times` mplused 1000000 (10.83 secs, 5324731444 bytes)
I suspect that I can achieve similar results by using the list monad and converting to a set in the very end. In what situations can I benefit from set-monad?
Tillmann

Hi George, thanks for your detailed reply. George Giorgidze wrote:
The key to the approach used in set-monad is to make progress with the evaluation of the unconstrained constructors (i.e., Return, Bind, Zero and Plus) without using constrained set-specific operations. It turns out that for several cases one can progress with the evaluation without duplicating f (evaluation relies on monoid laws, Plus is associative and Zero is left and right identity of Plus). I have pushed those optimisations to the repo. These optimisations also cover your example.
Cool.
In your email, you have also asked:
I suspect that I can achieve similar results by using the list monad and converting to a set in the very end. In what situations can I benefit from set-monad?
Sometimes set is a more appropriate data structure than list. [...]
Of course. But I was wondering whether something like set-monad could be implemented with newtype Set a = Set [a] instead of data Set a = Prim ... | Return ... | Bind ... | ... Both approaches can offer the same interface, but now I think I see two reasons why the latter is more efficient than the former: (1) It allows to use set-operations when an Ord is known, for example, when the user calls union, and (2) It allows optimizations like you describe above. Tillmann
participants (3)
-
David Menendez
-
George Giorgidze
-
Tillmann Rendel