
To improve my understanding of GADT, I tried to define a Set datatype, with the usual operations, so that it can be made a member of the standard Monad class. Here I report on my experiments. First, I recap the problem. Data.Set.Set can not be made a Monad because of the Ord constraint on its parameter, while e.g. (return :: a -> m a) allows any type inside the monad m. This problem can be solved using an alternative monad class (restricted monad) so that the Ord context is actually provided for the monad operations. Rather, I aimed for the standard Monad typeclass. To avoid reinventing Data.Set.Set, my datatype is based on that. Basically, when we know of an Ord context, we use a Set. Otherwise, we use a simple list representation. Using a list is just enough to allow the implementation of return (i.e. (:[])) and (>>=) (i.e. map and (++)), so while not being very efficient, it is simple. Other non-monadic operators require an Ord context, so that we can turn lists into Set. My first shot at this was: data SetM a where L :: [a] -> SetM a SM :: Ord a => Set.Set a -> SetM a However, this was not enough to convince the type checker to use the Ord context stored in SM. Specifically, performing union with union (SM m1) (SM m2) = SM (m1 `Set.union` m2) causes GHC to report "No instance for (Ord a)". After some experiments, I found the following, using a type equality witness: data Teq a b where Teq :: Teq a a data SetM a where L :: [a] -> SetM a SM :: Ord w => Teq a w -> Set.Set w -> SetM a Now if I use union (SM p1 m1) (SM p2 m2) = case (p1,p2) of (Teq,Teq) -> SM Teq (m1 `Set.union` m2) it typechecks! This rises some questions I cannot answer: 1) Why the first version did not typececk? 2) Why the second one does? 3) If I replace (Teq a w) with (Teq w a), as in SM :: Ord w => Teq w a -> Set.Set w -> SetM a then union above does not typecheck! Why? I guess the type variable unification deriving from matching Teq is not symmetric as I expect it to be... Below, I attach the working version. Monad and MonadPlus instances are provided for SetM. Conversions from/to Set are also provided, requiring an Ord context. "Efficient" return and mzero are included, forcing the Set representation to be used, and requiring Ord (these could also be derived from fromSet/toSet, however). Comments are very welcome, of course, as well as non-GADT related alternative approaches. Regards, Roberto Zunino. ============================================================ \begin{code} {-# OPTIONS_GHC -Wall -fglasgow-exts #-} module SetMonad ( SetM() , toSet, fromSet , union, unions , return', mzero' ) where import qualified Data.Set as S import Data.List hiding (union) import Control.Monad -- Type equality witness data Teq a b where Teq :: Teq a a -- Either a list or a real Set data SetM a where L :: [a] -> SetM a SM :: Ord w => Teq a w -> S.Set w -> SetM a instance Monad SetM where return = L . (:[]) m >>= f = case m of L l -> unions (map f l) SM Teq s -> unions (map f (S.toList s)) instance MonadPlus SetM where mzero = L [] mplus = union -- Efficient variants for Ord types return' :: Ord a => a -> SetM a return' = SM Teq . S.singleton mzero' :: Ord a => SetM a mzero' = SM Teq S.empty -- Set union: use the best representation union :: SetM a -> SetM a -> SetM a union (L l1) (L l2) = L (l1 ++ l2) union (SM p1 m1) (SM p2 m2) = case (p1,p2) of (Teq,Teq) -> SM Teq (m1 `S.union` m2) union (L l1) (SM p m2) = case p of Teq -> SM Teq (m2 `S.union` S.fromList l1) union s1 s2 = union s2 s1 -- Try to put a SM first before folding, to improve performance unions :: [SetM a] -> SetM a unions = let isSM (SM _ _) = True isSM _ = False in foldl' union (L []) . uncurry (++) . break isSM -- Conversion from/to Set requires Ord toSet :: Ord a => SetM a -> S.Set a toSet (L l) = S.fromList l toSet (SM p m) = case p of Teq -> m fromSet :: Ord a => S.Set a -> SetM a fromSet = SM Teq -- Tests test :: IO () test = do let l = [1..3] :: [Int] s = fromSet (S.fromList l) g x = return' x `mplus` return' (x+100) print $ S.toList $ toSet $ do x <- s y <- s return' (x+y) -- [2,3,4,5,6] print $ S.toList $ toSet $ do x <- s g x -- [1,2,3,101,102,103] print $ S.toList $ toSet $ do x <- s y <- g x return' y -- [1,2,3,101,102,103] print $ S.toList $ toSet $ do x <- s y <- g x g y -- [1,2,3,101,102,103,201,202,203] print $ S.toList $ toSet $ do x <- s y <- return (const x) -- no Ord! return' (y ()) -- [1,2,3] \end{code}

On 1/3/07, Roberto Zunino
1) Why the first version did not typececk? 2) Why the second one does? 3) If I replace (Teq a w) with (Teq w a), as in SM :: Ord w => Teq w a -> Set.Set w -> SetM a then union above does not typecheck! Why? I guess the type variable unification deriving from matching Teq is not symmetric as I expect it to be...
These are very interesting questions that I forgot about until reminded by Haskell Weekly News. Thanks, HWN! 1) Class constraints can't be used on pattern matching. They ARE restrictive on construction, however. This is arguably bug in the Haskell standard. It is fixed in GHC HEAD for datatypes declared in the GADT way, so as not to break H98 code: http://article.gmane.org/gmane.comp.lang.haskell.cvs.all/29458/match=gadt+cl... 2) The second one works because Class constraints can be used when pattern matching existentials. 3) I imagine this might have something to do with the coercions that System FC uses. With one ordering, a coercion might occur that in another one is unnecessary. This coercion might allow the use of Ord w by using it before the coercion from S.Set a to S.Set w. #3 is just a guess. Jim

Concerning (1) yes, this is a recent and substantial improvement in GHC (HEAD only).
Concerning (3), this is a plain bug. I've just fixed it in the HEAD. I very much doubt I'll fix it in 6.6, because type inference in the HEAD now works rather differently.
Thanks for the example; I've added it to the test suite!
Simon
| -----Original Message-----
| From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Jim
| Apple
| Sent: 09 January 2007 01:52
| To: haskell-cafe@haskell.org
| Subject: Re: [Haskell-cafe] Monad Set via GADT
|
| On 1/3/07, Roberto Zunino

Jim Apple wrote: > On 1/3/07, Roberto Zuninowrote: >> 1) Why the first version did not typececk? > 1) Class constraints can't be used on pattern matching. They ARE > restrictive on construction, however. This is arguably bug in the > Haskell standard. It is fixed in GHC HEAD for datatypes declared in > the GADT way, so as not to break H98 code: > http://article.gmane.org/gmane.comp.lang.haskell.cvs.all/29458/match=gadt+class+context To quote from there: "I think this is stupid, but it's what H98 says." Maybe it is time to consider it deprecated to follow the Haskell 98 standard /to the letter/. The above is an example where the default (without flags) should (arguably) be the 'fixed' standard. We would need an equivalent of gcc's -pedantic flag, meaning "Follow the Haskell 98 standard to the letter, even on issues where the standard is generally considered bad". I hope this will be handled in a better way with Haskell'. It should be possible to revise the standard (every few years or so, /very/ conservatively i.e. no extensions, etc) so that we can eliminate 'bugs' from the language spec. Cheers Ben

| > On 1/3/07, Roberto Zunino

On Friday 12 January 2007 09:04, Simon Peyton-Jones wrote:
| > On 1/3/07, Roberto Zunino
wrote: | >> 1) Why the first version did not typececk? | > | > 1) Class constraints can't be used on pattern matching. They ARE | > restrictive on construction, however. This is arguably bug in the | > Haskell standard. It is fixed in GHC HEAD for datatypes declared | > in the GADT way, so as not to break H98 code: | | http://article.gmane.org/gmane.comp.lang.haskell.cvs.all/29458/matc |h=gadt+class+context | | To quote from there: "I think this is stupid, but it's what H98 | says." | | Maybe it is time to consider it deprecated to follow the Haskell 98 | standard /to the letter/. GHC follows this strange standard when you write data type decls in H98 form
data Eq a => T a = C a Int | D
Here, pattern-matching on either C or D will cause an (Eq a) constraint to be required.
However, GHC does *not* follow this convention when you write the data type decl in GADT style syntax:
data T a where C :: Eq a => a -> Int -> T a D :: T a
Here, (a) you can be selective; in this case, C has the context but D does not. And (b) GHC treats the constraints sensibly: - (Eq a) is *required* when C is used to construct a value - (Eq a) is *provided* when C is used in pattern matching
In short, in GADT-style syntax, GHC is free to do as it pleases, so it does the "right" thing. In this case, then, you can avoid the H98 "bug" by using GADT-style syntax.
All of this is documented in the user manual. If it's not clear, please help me improve it.
Crystal clear. My remark was meant merely as a general observation. Cheers Ben

On 1/3/07, Roberto Zunino
I tried to define a Set datatype, with the usual operations, so that it can be made a member of the standard Monad class.
Also, we can do this with oleg's technique of "Restricted Data Types Now": http://article.gmane.org/gmane.comp.lang.haskell.prime/483 Jim
participants (4)
-
Benjamin Franksen
-
Jim Apple
-
Roberto Zunino
-
Simon Peyton-Jones