Monad instance for Data.Set

The following code solves exactly the problem of implementing (restricted) MonadPlus in terms of Data.Set: http://okmij.org/ftp/Haskell/DoRestrictedM.hs The code is written to demonstrate the do-notation. We write the monadic code as usual:
test1s_do () = do x <- return "a" return $ "b" ++ x
and then instantiate it for Maybe String:
test1sr_do :: Maybe String test1sr_do = unRM $ test1s_do () -- Just "ba"
or for Data.Set:
test2sr_do :: Set.Set String test2sr_do = unSM $ test1s_do () -- fromList ["ba"]
It seems GHC 6.10 will support the do-notation even for the generalized monads.

I was experimenting with Prompt today and found that you can get a "restricted monad" style of behavior out of a regular monad using Prompt:
{-# LANGUAGE GADTs #-} module SetTest where import qualified Data.Set as S
Prompt is available from http://hackage.haskell.org/cgi-bin/hackage-scripts/package/MonadPrompt-1.0.0...
import Control.Monad.Prompt
"OrdP" is a prompt that implements MonadPlus for orderable types:
data OrdP m a where PZero :: OrdP m a PRestrict :: Ord a => m a -> OrdP m a PPlus :: Ord a => m a -> m a -> OrdP m a
type SetM = RecPrompt OrdP
We can't make this an instance of MonadPlus; mplus would need an Ord constraint. But as long as we don't import it, we can overload the name.
mzero :: SetM a mzero = prompt PZero mplus :: Ord a => SetM a -> SetM a -> SetM a mplus x y = prompt (PPlus x y)
"mrestrict" can be inserted at various points in a computation to optimize it; it forces the passed in computation to complete and uses a Set to eliminate duplicate outputs. We could also implement mrestrict without an additional element in our prompt datatype, at the cost of some performance: mrestrict m = mplus mzero m
mrestrict :: Ord a => SetM a -> SetM a mrestrict x = prompt (PRestrict x)
Finally we need an interpretation function to run the monad and extract a set from it:
runSetM :: Ord r => SetM r -> S.Set r runSetM = runPromptC ret prm . unRecPrompt where -- ret :: r -> S.Set r ret = S.singleton -- prm :: forall a. OrdP SetM a -> (a -> S.Set r) -> S.Set r prm PZero _ = S.empty prm (PRestrict m) k = unionMap k (runSetM m) prm (PPlus m1 m2) k = unionMap k (runSetM m1 `S.union` runSetM m2)
unionMap is the equivalent of concatMap for lists.
unionMap :: Ord b => (a -> S.Set b) -> S.Set a -> S.Set b unionMap f = S.fold (\a r -> f a `S.union` r) S.empty
Oleg's test now works without modification:
test1s_do () = do x <- return "a" return $ "b" ++ x
olegtest :: S.Set String olegtest = runSetM $ test1s_do () -- fromList ["ba"]
settest :: S.Set Int settest = runSetM $ do x <- mplus (mplus mzero (return 2)) (mplus (return 2) (return 3)) return (x+3) -- fromList [5,6]
What this does under the hood is treat the computation on each element of the set separately, except at programmer-specified synchronization points where the computation result is required to be a member of the Ord typeclass. Synchronization points happen at every "mplus" & "mrestrict"; these correspond to a gathering of the computation results up to that point into a Set and then dispatching the remainder of the computation from that Set. -- ryan

On Tue, 25 Mar 2008, Ryan Ingram wrote:
I was experimenting with Prompt today and found that you can get a "restricted monad" style of behavior out of a regular monad using Prompt:
I recently developed a similar trick: http://hsenag.livejournal.com/11803.html It uses the regular MonadPlus rather than a custom mplus/mzero, and should work for any restricted monad. Your "mrestrict" is "Embed . unEmbed" in my code (and should be given a shorter name, like "reEmbed"). Of course, mplus and mzero can't optimise, since they don't have an Ord constraint. Cheers, Ganesh

On Tue, 25 Mar 2008, Ryan Ingram wrote:
settest :: S.Set Int settest = runSetM $ do x <- mplus (mplus mzero (return 2)) (mplus (return 2) (return 3)) return (x+3) -- fromList [5,6]
What this does under the hood is treat the computation on each element of the set separately, except at programmer-specified synchronization points where the computation result is required to be a member of the Ord typeclass.
It's like working in the List monad mainly, collapsing duplicates from time to time, right?

On Sun, Mar 30, 2008 at 1:09 PM, Henning Thielemann
It's like working in the List monad mainly, collapsing duplicates from time to time, right?
Sort of. You can look at it that way and get a basic understanding of what's going on. A slightly more accurate analysis of what is going on is that it is working in ContT Set for a variation of ContT that doesn't require the underlying object to be a full monad, but only a restricted one. In such a monad you could define
mplus :: ContT Set a -> ContT Set a -> ContT Set a mplus x y = lift $ union (runContT x id) (runContT y id) (not valid haskell code)
However, what is actually happening is that we are defining a set of "side-effectful" computations using Prompt, and then observing those computations in a "Set" environment. With this definition you can actually implement the interface for any monad you want; just define the operations in your data type. In this case:
data OrdP m a where PZero :: OrdP m a PRestrict :: Ord a => m a -> OrdP m a PPlus :: Ord a => m a -> m a -> OrdP m a
type SetM = RecPrompt OrdP
Every monad provides at least the same operations as the Identity monad; this definition says that SetM is a monad that provides those operations, plus three additional operations: "prompt PZero", "prompt $ PRestrict x", and "prompt $ PPlus x y" of the types shown in the definition of "OrdP". You can then interpret those operations however you want; runSetM defines an observation function that runs the computation and returns its results in a Set, given the restriction that the computation itself returns an Ord type. In order to really understand this, you need to understand the type of "runPromptC": runPromptC :: (r -> ans) -- "pure" result handler -> (forall a. p a -> (a -> ans) -> ans) -- "side effect" handler that gets a continuation -> Prompt p r -- computation to run -> ans "runPromptC" is (almost) just the case operation for a structure of this type: data Prompt p r = Return r | forall a. BindEffect (p a) (a -> Prompt p r) except with the recursive call to runPromptC inlined within BindEffect; given this data type you can define runPromptC easily: runPromptC ret _ (Return r) = ret r runPromptC _ prm (BindEffect p k) = prm p (\a -> runPromptC ret prm (k a)) This definition makes it obvious that the "pure" continuation "ret" is called at the end of the computation, and the "effectful" continuation prm is called to handle any side effects. Exercise 1: Define the function "prompt :: p a -> Prompt p a" on this datatype. Exercise 2: Define an instance of Monad for this datatype. Now you should be able to understand the observation function "runSetM":
runSetM :: Ord r => SetM r -> S.Set r runSetM = runPromptC ret prm . unRecPrompt where -- ret :: r -> S.Set r ret = S.singleton -- prm :: forall a. OrdP SetM a -> (a -> S.Set r) -> S.Set r prm PZero _ = S.empty prm (PRestrict m) k = unionMap k (runSetM m) prm (PPlus m1 m2) k = unionMap k (runSetM m1 `S.union` runSetM m2)
"ret" handles the result of pure computations; that is, those that could have just as easily run in the Identity monad. "prm" handles any effects; in this case the three effects "PZero", "PRestrict" and "PPlus". You could write a different observation/interpretation function that treated the elements as a List, or Maybe, or whatever. Let me know if this makes sense, or if you have any other questions. -- ryan
participants (4)
-
Ganesh Sittampalam
-
Henning Thielemann
-
oleg@okmij.org
-
Ryan Ingram