Issues(Bugs?) with GHC Type Families

Hi all, I have recently tried to replicate some examples from in the articles about type families but found some possible bugs. In [2], the example class C a where type S a (k :: * -> *) :: * instance C [a] where type S [a] k = (a,k a) does not compile under the claim that the type variable k is not in scope. However, if we extract the type family from the class type family S a (k :: * -> *) :: * type instance S [a] k = (a, k a) class C a it compiles correctly. According to [3], the difference is purely syntactic sugar, does that mean that both examples should compile and behave the same or is there some subtlety that justifies the class example not to compile? Another issue is that data kinds (used in both [2] and [3]) do not seem to be supported at all by the compiler, are they already implemented in GHC? Simple examples such as datakind Nat = Zero or datakind Nat = Zero | Succ Nat fail to compile. Perhaps some of these should be submitted to the GHC Bug Tracker. I have tested both GHC 6.8.2 and 6.9.20080218. References: 1. Associated Types with Class.http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In *Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)*, pages 1-13, ACM Press, 2005. 2. Associated Type Synonyms.http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In *Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming *, ACM Press, pages 241-253, 2005. 3. Towards Open Type Functions for Haskell.http://www.cse.unsw.edu.au/~chak/papers/SSPC07.html Tom Schrijvers, Martin Sulzmann, Simon Peyton-Jones, and Manuel M. T. Chakravarty. Unpublished manuscript, 2007.

2008/3/3 Hugo Pacheco
class C a where type S a (k :: * -> *) :: * instance C [a] where type S [a] k = (a,k a)
does not compile under the claim that the type variable k is not in scope.
It's not entirely syntactical sugar; I believe that when a type family is a member of a type-class it makes the assertion that only the class variables are part of the "function". In the class declaration:
class C a where type S a (k :: * -> *) :: *
there are two arguments to the type function S, but the class C only provides one. Contrast with the following
type Pair a k = (a, k a) class C a where type S a :: (* -> *) -> * instance C [a] where type S [a] = Pair a
datakind Nat = Zero or datakind Nat = Zero | Succ Nat
datakinds are listed under "future work" in the papers I have read; they aren't implemented yet. -- ryan

Hugo Pacheco:
I have recently tried to replicate some examples from in the articles about type families but found some possible bugs.
In [2], the example
class C a where type S a (k :: * -> *) :: * instance C [a] where type S [a] k = (a,k a)
does not compile under the claim that the type variable k is not in scope.
However, if we extract the type family from the class
type family S a (k :: * -> *) :: * type instance S [a] k = (a, k a) class C a
it compiles correctly. According to [3], the difference is purely syntactic sugar, does that mean that both examples should compile and behave the same or is there some subtlety that justifies the class example not to compile?
I am sorry for the confusion this causes, but we wrote the paper before the implementation in GHC was finalised. Hence, the implementation deviates from the paper in some aspects. Generally, please use http://haskell.org/haskellwiki/GHC/Type_families (which will be integrated into GHC's Users Manual for 6.10) as the definite guide to the implementation. Here a brief rational for this change in design. We originally intended to distinguish between type parameters that do use type indexing (the a in (S a k)) and those that do not (the k in (S a k)). However, we found later that this leads to implementation problems. The new rule is that any explicitly named parmeters, eg, s and k in type family S a (k :: * -> *) :: * are indexed. If you don't want to use the second parameter as an index, write type S a :: (* -> *) -> * This is also simpler to explain, as simply *all* explicitly named parameters in a type family declaration are indicies.
Another issue is that data kinds (used in both [2] and [3]) do not seem to be supported at all by the compiler, are they already implemented in GHC?
Simple examples such as
datakind Nat = Zero or datakind Nat = Zero | Succ Nat
fail to compile.
No, datakinds aren't supported yet. We still intend to add them, but our first priority is to thoroughly debug the existing functionality and especially to make sure the interaction with GADTs works well. Thanks for the feedback. Manuel

Just something I have been wondering. I would like to implement somehting like: type family F a :: * -> * ... class C a b where ... instance (F a ~ F b) => C a b where ... But apparently type equality coercions can not be used as a single context. If I enable -fallow-undecidable-instances, whenever the equality does not hold, the instance returns a compile error, what does make sense. Is there any way I could overcome this? Thanks, hugo

Hugo Pacheco:
Just something I have been wondering.
I would like to implement somehting like:
type family F a :: * -> * ... class C a b where ... instance (F a ~ F b) => C a b where ...
But apparently type equality coercions can not be used as a single context. If I enable -fallow-undecidable-instances, whenever the equality does not hold, the instance returns a compile error, what does make sense.
Is there any way I could overcome this?
I think I don't understand your question fully. This class instance requires both -XFlexibleInstances and -fallow-undecidable-instances to compile. If the equality does not hold, you should get a type error because your program is not type correct. So, what is it that you would like different? Manuel

If the equality does not hold, you should get a type error because your program is not type correct. So, what is it that you would like different?
I would simply like the compiler not to use that instance if the equality constraint does not hold, like some another instance dependency constraint, but I assume that is not possible.
hugo

What I said is not true since overlapping instances are not that much decidable. Btw, in previous versions of GHC this worked well, but now I suppose order does not suffices to define instances overlapping How could I compile such an example, assuming that I want to use the instance C String for Strings only and the more general instance for the rest? class C a where c :: a instance C Char where c = 'a' instance C a => C [a] where c = [c :: a,c :: a] instance C String where c = "a" cc = c :: String Sorry for the newbie question. hugo

2008/3/6 Hugo Pacheco
How could I compile such an example, assuming that I want to use the instance C String for Strings only and the more general instance for the rest?
class C a where c :: a instance C Char where c = 'a' instance C a => C [a] where c = [c :: a,c :: a] instance C String where c = "a" cc = c :: String
This is actually a general issue with the way typeclasses are defined in Haskell; the accepted solution is what the "Show" typeclass does:
class C a where c :: a cList :: [a] cList = [c,c]
instance C Char where c = 'a' cList = "a" -- replaces instance for String above instance C a => C [a] where c = cList cc = c :: String
I don't really like this solution; it feels like a hack and it relies on knowing when you define the typeclass what sort of overlap you expect. I wish there was some form of instance declaration that let you do case analysis; something similar to the following: instances C [a] where instance C String where c = "a" instance C a => C [a] where c = [c,c] instance Num a => C [a] where c = [0] When trying to find a constraint for C [a] for some type a, the instances would be checked in order: 1) If a was Char, the first instance would be selected. 2) Otherwise, if there was an instance for C a, the second instance would be selected. 3) Otherwise, if there was an instance for Num a, the third instance would be selected. 4) Otherwise a type error would occur. Then overlapping instances wouldn't be required nearly as often; the relevant instances would all have to be defined in one place, but that's often the case anyways. In exchange for giving up some of the openness of typeclasses, you would get the benefit of being able to have better control over instance definitions. You could also use this to create "closed" typeclasses using "instances C a where ..."; the compiler would then know that all possible instances were defined at that location; any other definition would clearly overlap and so the compiler could rule out their existence immediately. There might be some benefit in this case in terms of removing ambiguities that currently arise due to open typeclasses. -- ryan

On Thu, Mar 6, 2008 at 7:52 PM, Ryan Ingram
I wish there was some form of instance declaration that let you do case analysis; something similar to the following:
instances C [a] where
instance C String where c = "a"
instance C a => C [a] where c = [c,c] instance Num a => C [a] where c = [0]
When trying to find a constraint for C [a] for some type a, the instances would be checked in order: 1) If a was Char, the first instance would be selected. 2) Otherwise, if there was an instance for C a, the second instance would be selected. 3) Otherwise, if there was an instance for Num a, the third instance would be selected. 4) Otherwise a type error would occur.
I agree that this would be nice. But I think that the situation is stickier than it looks on the surface. Consider this: instances GShow [a] where instance [Char] where gShow = id instance [a] where gShow x = "list(" ++ length x ++ ")" gShow' :: [a] -> String gShow' = gShow For any implementation which keeps implementations polymorphic (i.e. no required inlining like C++), despite what the program *says*, gShow' would not be equal to gShow. When gShow' calls gShow, it needs a GShow dictionary; so it does constraint solving and finds a dictionary for [a]. a is not Char, so it uses the [a] dictionary, not the [Char] dictionary. Therefore: gShow "hello" = "hello" gShow' "hello" = "list(5)" That's just the first issue, and that alone is enough to make me not want this. When we get to more complex examples like the one you gave above involving constraints, in order to solve for the constraints needed for an instance of C [a] you need a *disjunction* constraint, which vastly increases the likelihood of an undecidable solution. Luke

On Thu, Mar 6, 2008 at 7:16 PM, Luke Palmer
I agree that this would be nice. But I think that the situation is stickier than it looks on the surface. Consider this:
instances GShow [a] where instance [Char] where gShow = id instance [a] where gShow x = "list(" ++ length x ++ ")"
gShow' :: [a] -> String gShow' = gShow
I'm not so sure. It's not that hard to imagine a compiler where the core-language expansion of this code looks like this: data GShowDict a = GShowDict { gShow :: a -> String } id :: (a :: *) -> a -> a id A a = a length :: (a :: *) -> [a] -> Int -- definition elided gShow' :: (a :: *) -> [a] -> String gShow' A = gShow (mkDict_GShow_List A) mkDict_GShow_List :: (a :: *) -> GShowDict [a] mkDict_GShow_List A = typecase A of Char -> GShowDict (id [A]) _ -> GShowDict (\xs -> length A xs) Now, it's true that this means that you can no longer do full type erasure, but I'm not convinced of the importance of that anyways; if you look at mainstream languages you generally only get type erasure for a restricted subset of the types and that's good enough for performance: 1) In Java, you only get type erasure for primitive types; everything else needs its type tag so it can be safely downcasted from Object. 2) In C++ only primitive types and structs/classes with no virtual functions get type erasure; if a class has virtual functions its virtual function table is implicitly a "type tag". I don't think the cost is that great; the compiler can easily flag polymorphic functions that require type information for some or all arguments and in many cases the type evidence can be passed "just-in-time" when calling from a non-polymorphic function into a polymorphic function.
When we get to more complex examples like the one you gave above involving constraints, in order to solve for the constraints needed for an instance of C [a] you need a *disjunction* constraint, which vastly increases the likelihood of an undecidable solution.
This is definitely an issue. I don't have a full understanding of the methods used for typechecking/inference of typeclasses; I'm just a programmer who wants MORE POWER :) -- ryan

On Fri, Mar 7, 2008 at 2:10 AM, Ryan Ingram
I don't think the cost is that great; the compiler can easily flag polymorphic functions that require type information for some or all arguments and in many cases the type evidence can be passed "just-in-time" when calling from a non-polymorphic function into a polymorphic function.
As I understand it, the cost in performance isn't too bad. For
example, JHC uses type-passing to implement type classes. One problem
is that a language with pervasive typecase has weaker guarantees. For
example, in present-day Haskell we can tell that length :: [a] -> Int
works the same for every type 'a' just by looking at the type
signature. That's not the case if we can call typecase in arbitrary
functions.
We can avoid that problem by requiring a type representation
parameter, or otherwise indicating the use of typecase in the
signature. Modern Haskell provides this with the Typeable class.
(Which, in JHC, is essentially empty.)
Thus, another solution for writing the show instance for lists is:
class (Typeable a) => Show [a] where
show s =
case cast s of
Just str -> "\"" ++ str ++"\""
Nothing -> showList s
where showList ...
To summarize: we can make more claims about a function with type
"forall a. F a -> G a" than one with type "forall a. (Typeable a) => F
a -> G a", even though every type can (as far as I know) be made an
instance of Typeable.
--
Dave Menendez

On Thu, Mar 6, 2008 at 9:52 PM, Ryan Ingram
This is actually a general issue with the way typeclasses are defined in Haskell; the accepted solution is what the "Show" typeclass does:
class C a where c :: a cList :: [a] cList = [c,c]
instance C Char where c = 'a' cList = "a" -- replaces instance for String above
instance C a => C [a] where c = cList cc = c :: String
I don't really like this solution; it feels like a hack and it relies on knowing when you define the typeclass what sort of overlap you expect.
The pattern above can be thought of as a specialized version of this:
class C a where
c :: a
class ListOfC a where
listOfC :: [a]
instance (ListOfC a) => C [a] where
c = listOfC
This works for an arbitrary number of type constructors (good!), but
requires a ListOfC instance for everything you might want to put into
a list (cumbersome!).
You can make things slightly less verbose by putting a sensible
default in ListOfC,
class (C a) => ListOfC a where
listOfC :: [a]
listOfC = default_c
--
Dave Menendez

Hugo Pacheco:
If the equality does not hold, you should get a type error because your program is not type correct. So, what is it that you would like different?
I would simply like the compiler not to use that instance if the equality constraint does not hold, like some another instance dependency constraint, but I assume that is not possible.
This is independent of type families. The selection of type class instances does *never* dependent on the instance context, only on the instance head. I know that this is sometimes annoying, but type checking would be a a lot more complicated/expensive without that rule. Manuel
participants (5)
-
David Menendez
-
Hugo Pacheco
-
Luke Palmer
-
Manuel M T Chakravarty
-
Ryan Ingram