Proposal to solve Haskell's MPTC dilemma

This message presents, informally, a proposal to solve Haskell's MPTC (multi-parameter type class) dilemma. If this informal proposal turns out to be acceptable, we (I am a volunteer) can proceed and make a concrete proposal. The proposal has been published in the SBLP'2009 proceedings and is available at www.dcc.ufmg.br/~camarao/CT/solution-to-MPTC-dilemma.pdf The well-known dilemma (hackage.haskell.org/trac/haskell-prime/wiki/MultiParamTypeClassesDilemma) is that it is generally accepted that MPTCs are very useful, but their introduction is thought to require the introduction also of FDs (Functional Dependencies) or another mechanism like ATs (Associated Types) and FDs are tricky and ATs, somewhat in a similar situation, have been defined more recently and there is less experience with its use. In www.haskell.org/ghc/dist/current/docs/html/users_guide/type-class-extensions.html there exists a solution to the termination problem related to the introduction of MPTCs in Haskell. In our proposal, neither FDs nor any other mechanism like ATs are needed in order to introduce MPTCs in Haskell; the only change we have to make is in the ambiguity rule. This is explained below. The termination problem is essentially ortogonal and can be dealt with, with minor changes, as described in the solution presented in the above mentioned (type-class-extensions) web page. Let us review the ambiguity rule used in Haskell-98 and after that the ambiguity rule used in GHC. Haskell-98 ambiguity rule (which is adequate for Haskell-98's single parameter type classes) is: a type C => T is ambiguous iff there is a type variable v that occurs in the "context" (constraint set) C but not in the simple (unconstrained) type T. For example: forall a.(Show a, Read a)=>String is ambiguous, because "a" occurs in the constraints (Show a,Read a) but not in the simple type (String). In the context of MPTCs, this rule alone is not enough. Consider, for example (Example 1): class F a b where f:: a->b class O a where o:: a and k = f o:: (C a b,O a) => b Type forall a b. (C a b,O a) => b can be considered to be not ambiguos, since overloading resolution can be defined so that instantiation of "b" can "determine" that "a" should also be instantiated (as FD b|->a does), thus resolving the overloading. GHC, since at least version 6.8, makes significant progress towards a definition of ambiguity in the context of MPTCs; GHC 6.10.3 Users Guide says (section 7.8.1.1): "GHC imposes the following restrictions on the constraints in a type signature. Consider the type: forall tv1..tvn (c1, ...,cn) => type. ... Each universally quantified type variable tvi must be reachable from type. A type variable a is reachable if it appears in the same constraint as either a type variable free in the type, or another reachable type variable. For example, type variable "a" in constraint (O a) in the example above is reachable, because it appears in (C a b) (the same constraint as type variable "b", which occurs in the simple type). Our proposal is: consider unreachability not as indication of ambiguity but as a condition to trigger overloading resolution (in a similar way that FDs trigger overloading resolution): when there is at least one unreachable variable and overloading is found not to be resolved, then we have ambiguity. Overloading is resolved iff there is a unique substitution that can be used to specialize the constraint set to one, available in the current context, such that the specialized constraint does not contain unreachable type variables. (A formal definition, with full details, is in the cited SBLP'09 paper.) Consider, in Example 1, that we have a single instance of F and O, say: instance F Bool Bool where f = not instance O Bool where o = True and consider also that "k" is used as in e.g.: kb = not k According to our proposal, kb is well-typed. Its type is Bool. This occurs because (F a b, O a)=>Bool can be simplified to Bool, since there exists a single substitution that unifies the constraints with instances (F Bool Bool) and (O Bool) available in the current context, namely S = (a|->Bool,b|->Bool). As another example (Example 2), consider: {-# NoMonomorphismRestriction, MultiParameterTypeClasses #-} data Matrix = ... data Vector = ... class Mult a b c where (*):: a ! b ! c instance Mult Matrix Vector Matrix where (*) = ... instance Mult Matrix Vector Vector where (*) = ... m1:: Matrix = ... m2:: Vector = ... m3:: Vector = ... m = (m1 * m2) * m3 GHC gives the following type to m: m :: forall a, b. (Mult Matrix Vector a, Mult a Vector b) => b However, "m" cannot be used effectively in a program compiled with GHC: if we annotate m::Matrix, a type error is reported. This occurs because the type inferred for "m" includes constraints (Mult Matrix Vector a, Mult a Vector Matrix) and, with b|->Matrix, type variable "a" appears only in the constraint set, and this type is considered ambiguous. Similarly, if "m" is annotated with type Vector, the type is also considered ambiguous. There is no means of specializing type variable "a" occurring in the type of "m" to Matrix. And no FD may be used in order to achieve such specialization, because we do not have here a FD: "a" and "b" do not determine "c" in class Mult: for (a,b = Matrix,Vector), we can have "c" equal to either Matrix or Vector. According to our proposal, type m:: forall a. (Mult Matrix Vector a, Mult a Vector Matrix)) => Matrix is not ambiguous, in the context of the class and instance declarations of Example 2 (since "a" can be instantiated to Matrix), whereas type m:: forall a.(Mult Matrix Vector a, Mult a Vector Matrix)) => Vector is ambiguous in this context. We would like to encourage compiler front-end developers to change Haskell's ambiguity rule, and test with as many examples as desired, including recurring examples that appear in Haskell mailing lists involving MPTCs, FDs etc. We would like to hear and discuss your doubts, opinions, comments, criticisms etc. Cheers, Carlos

On Thursday 20 May 2010 16:34:17, Carlos Camarao wrote:
In the context of MPTCs, this rule alone is not enough. Consider, for example (Example 1):
class F a b where f:: a->b class O a where o:: a and k = f o:: (C a b,O a) => b
Type forall a b. (C a b,O a) => b can be considered to be not ambiguos, since overloading resolution can be defined so that instantiation of "b" can "determine" that "a" should also be instantiated (as FD b|->a does), thus resolving the overloading.
<snip>
Our proposal is: consider unreachability not as indication of ambiguity but as a condition to trigger overloading resolution (in a similar way that FDs trigger overloading resolution): when there is at least one unreachable variable and overloading is found not to be resolved, then we have ambiguity. Overloading is resolved iff there is a unique substitution that can be used to specialize the constraint set to one, available in the current context, such that the specialized constraint does not contain unreachable type variables. (A formal definition, with full details, is in the cited SBLP'09 paper.)
Consider, in Example 1, that we have a single instance of F and O, say:
instance F Bool Bool where f = not instance O Bool where o = True
and consider also that "k" is used as in e.g.:
kb = not k
According to our proposal, kb is well-typed. Its type is Bool. This occurs because (F a b, O a)=>Bool can be simplified to Bool, since there exists a single substitution that unifies the constraints with instances (F Bool Bool) and (O Bool) available in the current context, namely S = (a|->Bool,b|->Bool).
But then somebody defines instance F Int Bool where f = even instance O Int where o = 0 What then? Using the available instances to resolve overloading is a tricky thing, it's very easy to make things break that way.

On Thu, May 20, 2010 at 11:54 AM, Daniel Fischer
On Thursday 20 May 2010 16:34:17, Carlos Camarao wrote:
In the context of MPTCs, this rule alone is not enough. Consider, for example (Example 1):
class F a b where f:: a->b class O a where o:: a and k = f o:: (C a b,O a) => b ... Our proposal is: consider unreachability not as indication of ambiguity but as a condition to trigger overloading resolution (in a similar way that FDs trigger overloading resolution): when there is at least one unreachable variable and overloading is found not to be resolved, then we have ambiguity. Overloading is resolved iff there is a unique substitution that can be used to specialize the constraint set to one, available in the current context, such that the specialized constraint does not contain unreachable type variables. ... Consider, in Example 1, that we have a single instance of F and O, say:
instance F Bool Bool where f = not instance O Bool where o = True
and consider also that "k" is used as in e.g.:
kb = not k
According to our proposal, kb is well-typed. Its type is Bool. This occurs because (F a b, O a)=>Bool can be simplified to Bool, since there exists a single substitution that unifies the constraints with instances (F Bool Bool) and (O Bool) available in the current context, namely S = (a|->Bool,b|->Bool).
But then somebody defines
instance F Int Bool where f = even instance O Int where o = 0
What then?
Then (F a b, O a)=>Bool is ambiguous. There are two substitutions that unify (F a b, O a) with instances in the current context.
Using the available instances to resolve overloading is a tricky thing, it's very easy to make things break that way.
Using the available instances is the natural, in fact the only way, to resolve overloading. Our proposal cannot make any well-typed program break, any program whatsoever. What it can do is to make programs that *were not well-typed* --- because there existed an ambiguity error --- to be either: i) well-typed (because overloading is resolved), or ii) not well-typed (because overloading cannot in fact be resolved), and in this case the ambiguity error may be deferred, to the point where the unreachability occurs, if there was a FD annotated. Cheers, Carlos

On 20 May 2010 16:50, Carlos Camarao
Using the available instances to resolve overloading is a tricky thing, it's very easy to make things break that way.
Using the available instances is the natural, in fact the only way, to resolve overloading.
AFAIK no other Haskell feature is defined in terms of "available instance" information. Overloaded functions are resolved by at least these mechanisms: * Defaulting * Information from unification (including from user-defined type signatures) * Functional dependencies propagating information "Available instances" are not a natural addition to this list. In particular, using that information can cause programs to become untypeable when the module or *any module it imports transitively* defines a new instance. This leads to programs that are extremely fragile in the face of changes in the libraries! (Admittedly you can get the same issue with GHC Haskell as it is right now if you define an orphan instance in your module) The situation is even worse if you consider "available" instances to also include orphans defined in non-imported modules (as a whole-program compiler way very well do), because then you don't even need to have transitively imported the module which has added an instance for your program to stop type-checking. Furthermore, if you intend to use an "overloaded" function at *one particular instance*, you could just have written the monomorphic type to begin with and not even bothered with overload resolution.
Our proposal cannot make any well-typed program break, any program whatsoever.
That is true, but it makes extra things type check in a really fragile way. I'm not keen. Cheers, Max

On Thu, May 20, 2010 at 12:25 PM, Max Bolingbroke
"Available instances" are not a natural addition to this list. In particular, using that information can cause programs to become untypeable when the module or *any module it imports transitively* defines a new instance. This leads to programs that are extremely fragile in the face of changes in the libraries!
This is an unavoidable consequence of MPTCs being open, is it not? If data types or function declarations permitted the post facto addition of new constructors or pattern matches, similar headaches would ensue due to non-locality of transitive propagation. Clearly open type classes are useful; open data types and functions would be useful as well, actually, but it would be madness to permit *only* open declarations. Yet, that is the situation with type classes. I wonder: Of cases where overload resolution via available instances would be reasonable, how many would also make sense as a closed type class? By comparison, it seems that many uses of OverlappingInstances are really just trying to express a closed type class with one or more default instances, akin to functions with _ patterns. I think, though I'm not certain, that both would be straightforward and non-fragile for a closed type class. - C.

I wonder: Of cases where overload resolution via available instances would be reasonable, how many would also make sense as a closed type class? By comparison, it seems that many uses of OverlappingInstances are really just trying to express a closed type class with one or more default instances, akin to functions with _ patterns. I think, though I'm not certain, that both would be straightforward and non-fragile for a closed type class.
Someone recently described the HASP project, at http://hasp.cs.pdx.edu/. It describes "habit", a haskell like language with some additions and subtractions. There are a couple interesting extensions to 'instance' declarations: -- explicitly declare that there is no instance, halting the compiler's search instance xyz fails -- declares instances along with search order instance abc ... else def The result is that if you put 'fails' at the end, you can make a closed typeclass. Presumably you could also make typeclasses open but only in a restricted way. Also presumably the compiler would then be able to make better decisions about overlapping instances and you could avoid a lot of overlapping problems. Of course, it's just a paper with no compiler, so it's all "presumably" for the moment...

On 20 May 2010 21:16, Evan Laforge
Someone recently described the HASP project, at http://hasp.cs.pdx.edu/. It describes "habit", a haskell like language with some additions and subtractions. There are a couple interesting extensions to 'instance' declarations:
-- explicitly declare that there is no instance, halting the compiler's search instance xyz fails
-- declares instances along with search order instance abc ... else def
The result is that if you put 'fails' at the end, you can make a closed typeclass. Presumably you could also make typeclasses open but only in a restricted way. Also presumably the compiler would then be able to make better decisions about overlapping instances and you could avoid a lot of overlapping problems.
Of course, it's just a paper with no compiler, so it's all "presumably" for the moment...
Hi Evan, hasn't EHC had something like this for a while with 'type class directives'?

Hi Evan, hasn't EHC had something like this for a while with 'type class directives'?
I dunno, I don't even know what ehc is. Is it this? http://www.cs.uu.nl/wiki/Ehc/WebHome I turned up a paper that mentioned type class directives, but I haven't read it yet. In any case, the EHC page says "nearing first release!" so "had something like this for a while" depends on your perspective of "has" :) In any case, I thought the habit paper was fun and had a number of things it would be nice to have in haskell (even simple sugar like case<- and if<-).

Hi Evan EHC - Essential Haskell Compiler - is the 'family of compilers' that UHC - Utrecht Haskell Compiler - is instance one of. The EHC family starts with a simple Haskell subset and adds features building up to (almost) Haskell98 for UHC and extended Haskell for some of the EHC variations. This style of software development is sometime called 'feature oriented development' or 'software product lines', the ML compiler MLPolyR is another compiler built as a family of variants. UHC had a version 1.0 release last year. From the documentation it doesn't look like UHC supports type class directives: http://www.cs.uu.nl/wiki/bin/view/Ehc/UhcUserDocumentation
From a bit of nosing around, I'm not sure that any of the EHC variants support type class directives either. It does look like Helium (Utrecht's simplified Haskell variant for teaching) supports them, and Helium has certainly been released. That said, I've no association with the Utrecht developers, so I'm not really qualified to say for EHC, though I have studied its source a bit and grepping hasn't turned up an answer. Incidentally, studying the source of EHC is probably the best way to learn idioms and techniques for UUAG.
From the 'Type Class Directives' paper here are some example directives:
never Eq (a -> b): functions cannot be tested for equality never Num Bool: arithmetic on booleans is forbidden disjoint Integral Fractional: something which is fractional can never be integral close Similar: the instances of Similar are @insts@. (Similar being an Eq like class that is available only for integers) Best wishes Stephen

On Thu, May 20, 2010 at 1:25 PM, Max Bolingbroke wrote: On 20 May 2010 16:50, Carlos Camarao Using the available instances to resolve overloading is a tricky thing,
it's very easy to make things break that way. Using the available instances is the natural, in fact the only way, to
resolve overloading. AFAIK no other Haskell feature is defined in terms of "available
instance" information. Overloaded functions are resolved by at least
these mechanisms:
* Defaulting
* Information from unification (including from user-defined type
signatures)
* Functional dependencies propagating information "Available instances" are not a natural addition to this list. In
particular, using that information can cause programs to become
untypeable when the module or *any module it imports transitively*
defines a new instance. This leads to programs that are extremely
fragile in the face of changes in the libraries! Replace
"Using the available instances is the natural, in fact the only way, to
resolve overloading."
by
"All overloading resolution has to consider the available instances,
after unification, user-defined type signatures, defaulting and FDs have
been
considered."
Extremely fragile is debatable; the important fact, highlighted, is that if
overloading resolution depends on the existence of a unique substitution,
then
the program may become untypeable when other instances are defined in the
visible typing context.
(Admittedly you can get the same issue with GHC Haskell as it is right now if you define an orphan instance in your module) Yes. Just to emphasize, with our proposal the issue (of transforming a
well-typed program into a program which is not well-typed) is relevant only
when unreachable variables exist (for existing programs this never occurs,
since they are unambiguous according to the existing ambiguity rule).
The situation is even worse if you consider "available" instances to also include orphans defined in non-imported modules (as a
whole-program compiler way very well do), because then you don't even
need to have transitively imported the module which has added an
instance for your program to stop type-checking. Consider instances defined in non-imported modules to be visible in the
current
context is not correct, I think... Furthermore, if you intend to use an "overloaded" function at *one
particular instance*, you could just have written the monomorphic type
to begin with and not even bothered with overload resolution. Sorry, I do not follow you here (why *at one particular instance*?).
A polymorphic (overloaded or not) function is defined and then used at
specific cases,
with different (instance) types. Our proposal cannot make any well-typed program break, any program
whatsoever. That is true, but it makes extra things type check in a really fragile
way. I'm not keen. Really fragile meaning:
"Overloading resolution depends on the set of available instances in the
following way:
if a constraint on the type of an expression *contains unreachable
variables* and these
type variables can be instantiated, by a single substitution(*), to
instances in the current
context, then the constraint/overloading is resolved, and the type is not
ambiguous)."
then yes, ok.
(*) Single restricted to the domain of unreachable type variables...
Also, the same fragilty occurs if FDs are used. Cheers,
Max Cheers,
Carlos

On 20 May 2010 20:30, Carlos Camarao
Consider instances defined in non-imported modules to be visible in the current context is not correct, I think...
I was under the impression that this was not specified, because orphans are a bit of an oddity. But naturally the Haskell spec says what to do (Sec 5.4, http://www.haskell.org/onlinereport/modules.html): "Thus, an instance declaration is in scope if and only if a chain of import declarations leads to the module containing the instance declaration." So you are correct.
Sorry, I do not follow you here (why *at one particular instance*?). A polymorphic (overloaded or not) function is defined and then used at specific cases, with different (instance) types.
I got carried away here and didn't think that comment through, sorry for the error. You propose delaying the (possible) lack of a unique instantiation of the type variables to the overloaded definitions use site, but that *does not mean* that there is only one possible instantiation at the *original definition site* because some of the type variables in the possibly-ambiguous context are still free in the type and hence subject to further unification.
Also, the same fragilty occurs if FDs are used.
This remark is surprising to me. I thought the point of the FDs being declared on the original class (and the subsequent coverage condition check on instances) was to ensure that this fragility couldn't happen. Can you show an example (without using orphan instances) so I can get the idea? Thanks, Max

On Thu, May 20, 2010 at 7:54 PM, Max Bolingbroke < batterseapower@hotmail.com> wrote:
On 20 May 2010 20:30, Carlos Camarao
wrote:
... Also, the same fragilty occurs if FDs are used.
This remark is surprising to me. I thought the point of the FDs being declared on the original class (and the subsequent coverage condition check on instances) was to ensure that this fragility couldn't happen. Can you show an example (without using orphan instances) so I can get the idea?
Thanks, Max
Well, what I meant is just that the same would happen if we had a FD a->b in Example 1... (maybe I am not following you). That is: the same would happen if Example 1 was written with a FD as follows: class F a b | a -> b where f:: a -> b instance O a where o:: a And we had the same context: instance F Bool Bool where f = not instance O Bool where o = True k = f o kb = not k Then: kb is well-typed, because FD a |->b "closes the world", causing type (F a b, O a)=>Bool to be simplified ("improved") to Bool. But this type-correct program would become not typeable if instances such as the ones referred to before (by Daniel Fischer) instance F Int Bool where f = even instance O Int where o = 0 were later introduced, or imported... Cheers, Carlos

On 21 May 2010 01:58, Carlos Camarao
But this type-correct program would become not typeable if instances such as the ones referred to before (by Daniel Fischer)
I was thinking this through, and the situation is more complex than I had thought. It seems that single param type classes enjoy a nice property: * Adding an instance to the module defining the class cannot conflict with any non-orphan instance defined elsewhere * Adding an instance for a type for a class *to the module defining that type* cannot conflict with any non-orphan instance defined elsewhere As long as you complicate the definition of "orphan", this seems to still hold for multi-param ones. The change you need is that an instance defined in a module other than that defining the class must be non-orphan *in each individual type variable*. So if a module defined the D and E data types you could make a C D E instance, but a C D Bool or C Bool E one is considered "orphan" by this definition. Functional dependencies refine this slightly: if a variable ("a") functionally determines another ("b"), an instance can be declared non-orphan as long as the "a" variable is being instantiated with a data type which is defined in the same module. So if "C a b" has FD "a |-> b" you can declare an instance for "C D Bool" but not "C Bool D". With this definition of "orphan" I don't think it is possible to get the library fragility issue as long as you stick to "non-orphan" instances by that definition. I haven't tried to prove this, though. Where this gets more interesting is that GHC's -fwarn-orphans check does *not* flag a "C D Bool" instance in a module defining D but not C as an orphan, whether C has a functional dependency or not. It will only flag an instance as orphan if *all* of the class type variables are being instantiated to a datatype defined in another module. This seems like a bug? So in summary I think I agree with you that your proposed mechanism does have fragility characteristics similar to FDs as they stand. One benefit (that I can see) to using explicitly declared FDs is that the compiler could potentially use those FDs to implement a correct orphan instance check, such that code that passed the check was guaranteed not to cause the library fragility issue in those modules that import it. However, it appears that GHC doesn't currently do this, which is upsetting. (Incidentally, the link to your paper is broken, so I haven't actually been able to read it, sorry!) Cheers, Max

Hi Max. Excerpts from Max Bolingbroke's message of Sex Mai 21 04:56:51 -0300 2010: (...)
(Incidentally, the link to your paper is broken, so I haven't actually been able to read it, sorry!)
It was easy to find it on google. http://www.dcc.ufmg.br/~camarao/CT/solution-to-mptc-dilemma.pdf Greetings. (...) -- marcot http://wiki.debian.org/MarcoSilva

On Fri, May 21, 2010 at 3:56 AM, Max Bolingbroke
On 21 May 2010 01:58, Carlos Camarao
wrote: But this type-correct program would become not typeable if instances such as the ones referred to before (by Daniel Fischer)
I was thinking this through, and the situation is more complex than I had thought.
It seems that single param type classes enjoy a nice property: * Adding an instance to the module defining the class cannot conflict with any non-orphan instance defined elsewhere * Adding an instance for a type for a class *to the module defining that type* cannot conflict with any non-orphan instance defined elsewhere
This is only true in the absence of recursive imports. Otherwise,
those points imply that I can put one instance in the module defining
the type and another in the module defining the class without
conflict.
--
Dave Menendez

On 21 May 2010 01:58, Carlos Camarao
But this type-correct program would become not typeable if instances such as the ones referred to before (by Daniel Fischer) ...
It seems that single param type classes enjoy a nice property: ... * Adding an instance to the module defining the class cannot conflict with any non-orphan instance defined elsewhere ...
Max, thanks very much for your message. I will defer comments about definitions of orphan instances to a postscript. First I'd like to say the following. I think that a notion of orphan instances based on whether an instance is defined or not in the module where the class of the instance is defined is not very nice, because classes have global scope and then it should not matter in which module a class is defined, as long as it is defined somewhere in the current or some imported module. Let us thus not try to extend a definition based on this to MPTCs. However, the module fragility issue you raised is quite relevant. Let us call a type-correct module M fragile if definitions inserted in modules imported by M cause M to become type-incorrect. This issue is more relevant in Haskell than in other languages because instance definitions are automatically imported when a module is imported, and importation of an instance cannot be forbidden. Our "unreacheable_var-implies-overloading_resolution_test" proposal can be viewed as a proposal to "test-visible-instances-before-considering-types-as-ambiguous". Ambiguity means the existence of two or more conflicting definitions that could be used in an expression (and the inexistence of a reasonable criteria for selecting between conflicting definitions). In Haskell, nowadays, ambiguity is reported without looking to see if there really are conflicting definitions. What FDs do is to require programmers to specify dependencies between class variables which make the compiler look to see if there exist definitions that can satisfy such dependencies, before reaching any conclusion that an expression is ambiguous. What we propose (relieving the burden on programmers) is to make the compiler look itself to see if there exist two or more (well, or none) definitions and report error only if this is the case (again, before reaching any conclusion that an expression is ambiguous). If there is exactly one instance definition, there is no ambiguity (there is no conflict); in such case, then: use that single definition that the programmer defined. A benefit of adopting our approach would be that defaulting would become unnecessary (defaulting always occurring in favor of visible definitions). Module robustness can be achieved --- even maintaining automatic importation of instance definitions --- by considering, for each instance definition, say I, defined by instance C bla1 I bla2 where ... that an automatic defaulting rule that names C (see hackage.haskell.org/trac/haskell-prime/wiki/Defaulting) is virtually and automatically inserted, namely: default C bla1 I bla2 It should perhaps be noted though that one could have more than one such automatic default rule for the same class. Cheers, Carlos ===================================================================== PS: I think that a definition of orphan/non-orphan instance definition for MPTCs should be different. Letting: defM(C) = module where C is defined local-datatype(T,M,C) = T is defined in M or T is defined in defM(C) import-list(M,C) = {M} U { import-list(M') | M imports M'} instance-def(I,C,M,tv) = I is an instance definition of class C, occurring in module M, that instantiates class variable tv then you consider (if I have correctly understood and expressed what you wrote): Non-orphan = [for all I,C,M,tv: instance-def(I,C,M,tv) => local-datatype(T,M,C)] Orphan = [there exists I,C,M,tv: instance-def(I,C,M,tv) ^ not local-datatype(T,M,C)] And I think a correct definition of orphan/non-orphan for MPTCs should be along the line: Non-orphan = [there exists I,C,M,tv: instance-def(I,C,M,tv) ^ local-datatype(T,M,C) Orphan = [forall I,C,M,tv: instance-def(I,C,M,tv) ^ not local-datatype(T,M,C)] That is, an instance should be considered non-orphan if there exists at least one datatype to which a class type variable in such instance is instantiated, because other instances which instantiated such class type variable to such datatype would be non-orphan. GHC is thus correct, I think.

Sorry to correct myself:
On Sat, May 22, 2010 at 10:24 PM, Carlos Camarao
... ===================================================================== PS: I think that a definition of orphan/non-orphan instance definition for MPTCs should be different. Letting: ...
instance-def(I,C,M,tv) = I is an instance definition of class C,
occurring in module M, that instantiates class variable tv
Should be: instance-def(I,C,M,tv,T) = I is an instance def of class C, occurring in module M, that instanticates class variable tv to datatype T and replace occurrences of instance-def(I,C,M,tv) to instance-def(I,C,M,tv,T). And I think a correct definition of orphan/non-orphan for MPTCs should be along the line:
... That is, an instance should be considered non-orphan if there exists at least one datatype to which a class type variable in such instance is instantiated, because other instances which instantiate such class type variable to such datatype would be non-orphan.
^^^^ (delete "non-".) ... type variable to such datatype would be orphan. Cheers, Carlos

Hi Carlos,
Apologies for the lateness of my reply.
On 23 May 2010 02:24, Carlos Camarao
I think that a notion of orphan instances based on whether an instance is defined or not in the module where the class of the instance is defined is not very nice
I broadly agree, but pragmatically the notion of orphans is useful for designing robust libraries, even if the notion is a bit horrible.
A benefit of adopting our approach would be that defaulting would become unnecessary (defaulting always occurring in favor of visible definitions).
This is something I don't understand (and is not elaborated in your paper that I can see). Defaulting seems like an orthogonal mechanism. It turns a constraint that really does have multiple solutions (e.g. (Num a) => ...) into one where a particular preferred choice is taken (e.g. Num Int), in situations where abstracting over the choice is disallowed. However, you mechanism only turns constraints into instances when there is no ambiguity. Can you perhaps explain what you mean a bit further? I looked at your definition for orphan-hood, which I think might be OK if you don't have FlexibleInstances. However, if you do then consider this series of modules: """ {-# LANGUAGE MultiParamTypeClasses #-} module Common where class C a b where foo :: a -> b -> String """ """ {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-} {-# OPTIONS_GHC -fwarn-orphans #-} module Mod2 where import Common data E = E instance C a E where foo _ _ = "Mod2" """ """ {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-} {-# OPTIONS_GHC -fwarn-orphans #-} module Mod1 where import Common data D = D instance C D b where foo _ _ = "Mod1" """ """ {-# OPTIONS_GHC -fwarn-orphans #-} import Common import Mod1 import Mod2 main = putStrLn (foo D E) """ None of the instances are reported as orphans but IMHO they should be, because we get a conflict in the Main module. I guess that a MPTC instance (C t1 .. tn) for class C in module M1 is NOT an orphan if: 1) C is defined in the same module as the instance 2) OR ALL the t1..tn are instantiated to some concrete type (i.e. not a type variable) defined in the same module as the instance Imagine that we had an instance defined in a different module than the class and violating 2). Then: \exists i. t_i is a type variable or a datatype defined in another module Case 1: If t_i is a type variable, we can have a parallel module M2: """ data F = F instance C a_1 ... a_{i-1} F a_{i+1} ... a_n where """ Adding the instance to M2 may break client code because it is potentially ambiguous with the one from M1. Furthermore, the instance is considered non-orphan by GHC because it has at least one type which is defined in the same module. However, at least one of this instance and the one in M1 should have been flagged as orphans :( Case 2: if t_i is a datatype G defined in another module, we can similarly consider adding a new instance to that module: """ instance C a_1 ... a_{i-1} G a_{i+1} ... a_n where """ Same argument as for case 1. Does this seem right? == Basically, you want an orphanhood criteria P you can test locally on a per-module basis such that: * For any composition of modules where P holds on every module individually... * Changing any module by *adding* instances such that P still holds.. * Is guaranteed not to break any other module due to ambiguity It is not clear to me exactly what this should look like, especially in the presence of more complicated instance definitions (like the "instance C [Bool]" style of thing allowed by FlexibleInstances. It would probably be interesting to find out though. Cheers, Max

On Wed, May 26, 2010 at 7:12 PM, Max Bolingbroke wrote: I broadly agree, but pragmatically the notion of orphans is useful for
designing robust libraries, even if the notion is a bit horrible. ... I guess that a MPTC instance (C t1 .. tn) for class C in module M1 is NOT an orphan if: 1) C is defined in the same module as the instance
I don't understand the motivation behind 1) (and why you want 1) to be a
part of orphan-hood). I think horrible should mean senseless,useless: if the
declaration of a class C is in a module M1, the fact that a definition of an
instance of C is in M1 or in another module M2 that imports M1 should not
make any difference for a third module M3 as long as the modules involved
are imported by M3.
In "None of the instances are reported as orphans but IMHO they should be,
because we get a conflict in the Main module.", the motivation (antecedent)
seems not enough for the conclusion, since a similar conflict happens e.g.
when you are importing any two overlapping instances, that instantiate class
variable(s) to the same type(s) and are defined in different modules. A benefit of adopting our approach would be that defaulting would
become unnecessary (defaulting always occurring in favor of visible
definitions). This is something I don't understand ...
Can you perhaps explain what you mean a bit further? Sorry, I should have said that defaulting is not necessary to force
instantiation (of unreachable variables) when there are no conflicting
definitions in the current context. Defaulting should be used only to remove
ambiguity, i.e. when there exist conflicting definitions and when
unreachable type variables appear that can be instantiated to such
conflicting definitions.
Cheers,
Carlos

Sorry for reopening an old thread, but I thought I'd counter some of the
negative feedback :)
I think this proposal is a great idea!
It seems like this would make working with MPTCs much easier.
When programming, I generally want to only specify the minimum amount of
information to make my code logically unambiguous.
If the code contains enough information to infer the proper instantiation
without the use of an FD, then I shouldn't need to add a FD.
It seems like this would have much more of a "it just works" feel than the
currently alternatives.
Also the MPTC + FDs type errors are a pain. I'm not sure if the type errors
for your proposal would be better, but it would be hard to make them worse.
I do worry about imported instances, (over which we currently have little
control) messing up our code. But this would probably be pretty unusual and
I feel that this is more of a problem with how instances are imported than
with this proposal itself.
Anyway, just my two cents,
- Job
On Thu, May 20, 2010 at 10:34 AM, Carlos Camarao
This message presents, informally, a proposal to solve Haskell's MPTC (multi-parameter type class) dilemma. If this informal proposal turns out to be acceptable, we (I am a volunteer) can proceed and make a concrete proposal.
The proposal has been published in the SBLP'2009 proceedings and is available at www.dcc.ufmg.br/~camarao/CT/solution-to-MPTC-dilemma.pdfhttp://www.dcc.ufmg.br/%7Ecamarao/CT/solution-to-MPTC-dilemma.pdf
The well-known dilemma (hackage.haskell.org/trac/haskell-prime/wiki/MultiParamTypeClassesDilemma) is that it is generally accepted that MPTCs are very useful, but their introduction is thought to require the introduction also of FDs (Functional Dependencies) or another mechanism like ATs (Associated Types) and FDs are tricky and ATs, somewhat in a similar situation, have been defined more recently and there is less experience with its use.
In
www.haskell.org/ghc/dist/current/docs/html/users_guide/type-class-extensions.html there exists a solution to the termination problem related to the introduction of MPTCs in Haskell. In our proposal, neither FDs nor any other mechanism like ATs are needed in order to introduce MPTCs in Haskell; the only change we have to make is in the ambiguity rule. This is explained below. The termination problem is essentially ortogonal and can be dealt with, with minor changes, as described in the solution presented in the above mentioned (type-class-extensions) web page.
Let us review the ambiguity rule used in Haskell-98 and after that the ambiguity rule used in GHC. Haskell-98 ambiguity rule (which is adequate for Haskell-98's single parameter type classes) is: a type C => T is ambiguous iff there is a type variable v that occurs in the "context" (constraint set) C but not in the simple (unconstrained) type T.
For example: forall a.(Show a, Read a)=>String is ambiguous, because "a" occurs in the constraints (Show a,Read a) but not in the simple type (String).
In the context of MPTCs, this rule alone is not enough. Consider, for example (Example 1):
class F a b where f:: a->b class O a where o:: a and k = f o:: (C a b,O a) => b
Type forall a b. (C a b,O a) => b can be considered to be not ambiguos, since overloading resolution can be defined so that instantiation of "b" can "determine" that "a" should also be instantiated (as FD b|->a does), thus resolving the overloading.
GHC, since at least version 6.8, makes significant progress towards a definition of ambiguity in the context of MPTCs; GHC 6.10.3 User’s Guide says (section 7.8.1.1):
"GHC imposes the following restrictions on the constraints in a type signature. Consider the type: forall tv1..tvn (c1, ...,cn) => type. ... Each universally quantified type variable tvi must be reachable from type. A type variable a is reachable if it appears in the same constraint as either a type variable free in the type, or another reachable type variable.”
For example, type variable "a" in constraint (O a) in the example above is reachable, because it appears in (C a b) (the same constraint as type variable "b", which occurs in the simple type).
Our proposal is: consider unreachability not as indication of ambiguity but as a condition to trigger overloading resolution (in a similar way that FDs trigger overloading resolution): when there is at least one unreachable variable and overloading is found not to be resolved, then we have ambiguity. Overloading is resolved iff there is a unique substitution that can be used to specialize the constraint set to one, available in the current context, such that the specialized constraint does not contain unreachable type variables. (A formal definition, with full details, is in the cited SBLP'09 paper.)
Consider, in Example 1, that we have a single instance of F and O, say:
instance F Bool Bool where f = not instance O Bool where o = True
and consider also that "k" is used as in e.g.:
kb = not k
According to our proposal, kb is well-typed. Its type is Bool. This occurs because (F a b, O a)=>Bool can be simplified to Bool, since there exists a single substitution that unifies the constraints with instances (F Bool Bool) and (O Bool) available in the current context, namely S = (a|->Bool,b|->Bool).
As another example (Example 2), consider:
{-# NoMonomorphismRestriction, MultiParameterTypeClasses #-} data Matrix = ... data Vector = ... class Mult a b c where (*):: a ! b ! c instance Mult Matrix Vector Matrix where (*) = ... instance Mult Matrix Vector Vector where (*) = ... m1:: Matrix = ... m2:: Vector = ... m3:: Vector = ... m = (m1 * m2) * m3
GHC gives the following type to m: m :: forall a, b. (Mult Matrix Vector a, Mult a Vector b) => b
However, "m" cannot be used effectively in a program compiled with GHC: if we annotate m::Matrix, a type error is reported. This occurs because the type inferred for "m" includes constraints (Mult Matrix Vector a, Mult a Vector Matrix) and, with b|->Matrix, type variable "a" appears only in the constraint set, and this type is considered ambiguous. Similarly, if "m" is annotated with type Vector, the type is also considered ambiguous. There is no means of specializing type variable "a" occurring in the type of "m" to Matrix. And no FD may be used in order to achieve such specialization, because we do not have here a FD: "a" and "b" do not determine "c" in class Mult: for (a,b = Matrix,Vector), we can have "c" equal to either Matrix or Vector.
According to our proposal, type m:: forall a. (Mult Matrix Vector a, Mult a Vector Matrix)) => Matrix is not ambiguous, in the context of the class and instance declarations of Example 2 (since "a" can be instantiated to Matrix), whereas type m:: forall a.(Mult Matrix Vector a, Mult a Vector Matrix)) => Vector is ambiguous in this context.
We would like to encourage compiler front-end developers to change Haskell's ambiguity rule, and test with as many examples as desired, including recurring examples that appear in Haskell mailing lists involving MPTCs, FDs etc. We would like to hear and discuss your doubts, opinions, comments, criticisms etc.
Cheers,
Carlos
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Jun 8, 2010, at 15:32 , Job Vranish wrote:
It seems like this would make working with MPTCs much easier. When programming, I generally want to only specify the minimum amount of information to make my code logically unambiguous. If the code contains enough information to infer the proper instantiation without the use of an FD, then I shouldn't need to add a FD. It seems like this would have much more of a "it just works" feel than the currently alternatives.
I can't help but think that the "it just works" mentality leads to duck typing. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
participants (10)
-
Brandon S. Allbery KF8NH
-
C. McCann
-
Carlos Camarao
-
Daniel Fischer
-
David Menendez
-
Evan Laforge
-
Job Vranish
-
Marco Túlio Gontijo e Silva
-
Max Bolingbroke
-
Stephen Tetley