Re: [Haskell] [Fwd: undecidable & overlapping instances: a bug?]

[Sorry, I guess this should have been in the cafe ...] Simon Peyton-Jones wrote:
The trouble is that a) the coverage condition ensures that everything is well behaved b) but it's too restrictive for some uses of FDs, notably the MTL library c) there are many possibilities for more generous conditions, but the useful ones all seem complicated
Concerning the last point I've dumped the current brand leader for (c) into http://hackage.haskell.org/trac/ghc/ticket/1241#comment:15.
Better ideas for (c) would be welcome.
Let's take the declaration: "instance P => C t where ..." The version of the "coverage condition" in my paper [1] requires that TV(t_Y) \subseteq TV(t_X), for each dependency (X->Y) \in F_C. (I'm using the notation from the paper; let me know if you need more help to parse it.) This formulation is simple and sound, but it doesn't use any dependency information that could be extracted from P. To remedy this, calculate L = F_P, the set of functional dependencies induced by P, and then expand the right hand side of the set inequality above by taking the closure of TV(t_X) with respect to L. In symbols, we have to check that: TV(t_Y) \subseteq TV(t_X)^+_L, for each (X->Y) \in F_C. I believe (but haven't formally proved) that this is sound; I don't know how to make a more general "coverage condition" without losing that. I don't know if it's sufficient for examples like MTL (because I'm not sure where to look for details of what that requires), but if it isn't then I'd be very suspicious ... All the best, Mark [1] http://www.cs.pdx.edu/~mpj/pubs/fundeps-esop2000.pdf

Mark P Jones writes:
[Sorry, I guess this should have been in the cafe ...]
Simon Peyton-Jones wrote:
The trouble is that a) the coverage condition ensures that everything is well behaved b) but it's too restrictive for some uses of FDs, notably the MTL library c) there are many possibilities for more generous conditions, but the useful ones all seem complicated
Concerning the last point I've dumped the current brand leader for (c) into http://hackage.haskell.org/trac/ghc/ticket/1241#comment:15.
Better ideas for (c) would be welcome.
Let's take the declaration: "instance P => C t where ..." The version of the "coverage condition" in my paper [1] requires that TV(t_Y) \subseteq TV(t_X), for each dependency (X->Y) \in F_C. (I'm using the notation from the paper; let me know if you need more help to parse it.) This formulation is simple and sound, but it doesn't use any dependency information that could be extracted from P. To remedy this, calculate L = F_P, the set of functional dependencies induced by P, and then expand the right hand side of the set inequality above by taking the closure of TV(t_X) with respect to L. In symbols, we have to check that:
TV(t_Y) \subseteq TV(t_X)^+_L, for each (X->Y) \in F_C.
I believe (but haven't formally proved) that this is sound; I don't know how to make a more general "coverage condition" without losing that. I don't know if it's sufficient for examples like MTL (because I'm not sure where to look for details of what that requires), but if it isn't then I'd be very suspicious ...
All the best, Mark
I think the above is equivalent to the (refined) weak coverage condition in [2] (see Section 6, p26). The weak coverage condition give us soundness. More precisely, we obtain confluence from which we can derive consistency (which in turn guarantees that the type class program is sound). *BUT* this only works if we have termination which is often very tricky to establish. For the example,
class Concrete a b | a -> b where bar :: a -> String
instance (Show a) => Concrete a b
termination holds, but the weak coverage condition does *not* hold. Indeed, this program should be therefore rejected. Martin

Sorry, forgot to add [2] http://www.comp.nus.edu.sg/~sulzmann/publications/jfp-fds-revised.pdf Martin Martin Sulzmann writes:
Mark P Jones writes:
[Sorry, I guess this should have been in the cafe ...]
Simon Peyton-Jones wrote:
The trouble is that a) the coverage condition ensures that everything is well behaved b) but it's too restrictive for some uses of FDs, notably the MTL library c) there are many possibilities for more generous conditions, but the useful ones all seem complicated
Concerning the last point I've dumped the current brand leader for (c) into http://hackage.haskell.org/trac/ghc/ticket/1241#comment:15.
Better ideas for (c) would be welcome.
Let's take the declaration: "instance P => C t where ..." The version of the "coverage condition" in my paper [1] requires that TV(t_Y) \subseteq TV(t_X), for each dependency (X->Y) \in F_C. (I'm using the notation from the paper; let me know if you need more help to parse it.) This formulation is simple and sound, but it doesn't use any dependency information that could be extracted from P. To remedy this, calculate L = F_P, the set of functional dependencies induced by P, and then expand the right hand side of the set inequality above by taking the closure of TV(t_X) with respect to L. In symbols, we have to check that:
TV(t_Y) \subseteq TV(t_X)^+_L, for each (X->Y) \in F_C.
I believe (but haven't formally proved) that this is sound; I don't know how to make a more general "coverage condition" without losing that. I don't know if it's sufficient for examples like MTL (because I'm not sure where to look for details of what that requires), but if it isn't then I'd be very suspicious ...
All the best, Mark
I think the above is equivalent to the (refined) weak coverage condition in [2] (see Section 6, p26). The weak coverage condition give us soundness. More precisely, we obtain confluence from which we can derive consistency (which in turn guarantees that the type class program is sound).
*BUT* this only works if we have termination which is often very tricky to establish.
For the example,
class Concrete a b | a -> b where bar :: a -> String
instance (Show a) => Concrete a b
termination holds, but the weak coverage condition does *not* hold. Indeed, this program should be therefore rejected.
Martin _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hello,
I believe that this "weak coverage condition" (which is also called
"the dependency condition" somewhere on the wiki) is exactly what GHC
6.4 used to implement but than in 6.6 this changed. According to
Simon's comments on the trac ticket, this rule requires FDs to be
"full" to preserve the confluence of the system that is described in
the "Understanding Fds via CHRs" paper. I have looked at the paper
many times, and I am very unclear on this point, any enlightenment
would be most appreciated (by the way, there was a post about that on
the haskell-prime list a couple of days ago, which contains a concrete
example as well).
To answer Mark's question, this rule provides enough power for mtl and
monadLib. They use classes like "class StateM m s | m -> s" and
instances like "instance StateM m s => StateM (ExceptionT m) s". (The
full source code for monadLib is at
http://www.galois.com/~diatchki/monadLib/monadLib-3.3.0/src/MonadLib.hs)
Martin, could you elaborate on the problem with non-termination? I
have seen examples where the type-checker could loop while trying to
prove things, but I was not aware that there were implications related
to soundness as well.
-Iavor
On 10/18/07, Martin Sulzmann
Mark P Jones writes:
[Sorry, I guess this should have been in the cafe ...]
Simon Peyton-Jones wrote:
The trouble is that a) the coverage condition ensures that everything is well behaved b) but it's too restrictive for some uses of FDs, notably the MTL library c) there are many possibilities for more generous conditions, but the useful ones all seem complicated
Concerning the last point I've dumped the current brand leader for (c) into http://hackage.haskell.org/trac/ghc/ticket/1241#comment:15.
Better ideas for (c) would be welcome.
Let's take the declaration: "instance P => C t where ..." The version of the "coverage condition" in my paper [1] requires that TV(t_Y) \subseteq TV(t_X), for each dependency (X->Y) \in F_C. (I'm using the notation from the paper; let me know if you need more help to parse it.) This formulation is simple and sound, but it doesn't use any dependency information that could be extracted from P. To remedy this, calculate L = F_P, the set of functional dependencies induced by P, and then expand the right hand side of the set inequality above by taking the closure of TV(t_X) with respect to L. In symbols, we have to check that:
TV(t_Y) \subseteq TV(t_X)^+_L, for each (X->Y) \in F_C.
I believe (but haven't formally proved) that this is sound; I don't know how to make a more general "coverage condition" without losing that. I don't know if it's sufficient for examples like MTL (because I'm not sure where to look for details of what that requires), but if it isn't then I'd be very suspicious ...
All the best, Mark
I think the above is equivalent to the (refined) weak coverage condition in [2] (see Section 6, p26). The weak coverage condition give us soundness. More precisely, we obtain confluence from which we can derive consistency (which in turn guarantees that the type class program is sound).
*BUT* this only works if we have termination which is often very tricky to establish.
For the example,
class Concrete a b | a -> b where bar :: a -> String
instance (Show a) => Concrete a b
termination holds, but the weak coverage condition does *not* hold. Indeed, this program should be therefore rejected.
Martin

| I believe that this "weak coverage condition" (which is also called | "the dependency condition" somewhere on the wiki) is exactly what GHC | 6.4 used to implement but than in 6.6 this changed. According to | Simon's comments on the trac ticket, this rule requires FDs to be | "full" to preserve the confluence of the system that is described in | the "Understanding Fds via CHRs" paper. I have looked at the paper | many times, and I am very unclear on this point, any enlightenment | would be most appreciated. Right. In the language of http://hackage.haskell.org/trac/ghc/ticket/1241, last comment (date 10/17/07), it would be *great* to replace the Coverage Condition (CC) with the Weak Coverage Condition (WCC) as Mark suggests. BUT to retain soundness we can only replace CC with WCC + B WCC alone without (B) threatens soundness. To retain termination as well (surely desirable) we must have WCC + B + C (You'll have to look at the ticket to see what B,C are.) And since A+B+C are somewhat onerous, we probably want to have CC or (WCC + B + C) At least we know of nothing else weaker that will do (apart from CC of course). Like you, Iavor, I find it very hard to internalise just why (B) and (C) are important. But I believe the paper gives examples of why they are, and Martin is getting good at explaining it. Martin: can you give an example, once more, of the importance of (B) (=fullness)? Anyway that's why I have been moving slowly in "fixing" GHC: the rule CC or (WCC + B + C) just seems too baroque. Simon

Simon Peyton-Jones writes:
| I believe that this "weak coverage condition" (which is also called | "the dependency condition" somewhere on the wiki) is exactly what GHC | 6.4 used to implement but than in 6.6 this changed. According to | Simon's comments on the trac ticket, this rule requires FDs to be | "full" to preserve the confluence of the system that is described in | the "Understanding Fds via CHRs" paper. I have looked at the paper | many times, and I am very unclear on this point, any enlightenment | would be most appreciated.
Right. In the language of http://hackage.haskell.org/trac/ghc/ticket/1241, last comment (date 10/17/07), it would be *great* to
replace the Coverage Condition (CC) with the Weak Coverage Condition (WCC)
as Mark suggests. BUT to retain soundness we can only replace CC with WCC + B WCC alone without (B) threatens soundness. To retain termination as well (surely desirable) we must have WCC + B + C
(You'll have to look at the ticket to see what B,C are.)
And since A+B+C are somewhat onerous, we probably want to have CC or (WCC + B + C)
At least we know of nothing else weaker that will do (apart from CC of course).
Like you, Iavor, I find it very hard to internalise just why (B) and (C) are important. But I believe the paper gives examples of why they are, and Martin is getting good at explaining it. Martin: can you give an example, once more, of the importance of (B) (=fullness)?
Fullness (B) is a necessary condition to guarantee that the constraint solver (aka CHR solver) derived from the type class program is confluent. Here's an example (taken from the paper). class F a b c | a->b instance F Int Bool Char instance F a b Bool => F [a] [b] Bool The FD is not full because the class parameter c is not involved in the FD. We will show now that the CHR solver is not confluent. Here is the translation to CHRs (see the paper for details) rule F a b1 c, F a b2 d ==> b1=b2 -- (FD) rule F Int Bool Char <==> True -- (Inst1) rule F Int a b ==> a=Bool -- (Imp1) rule F [a] [b] Bool <==> F a b Bool -- (Inst2) rule F [a] c d ==> c=[b] -- (Imp2) The above CHRs are not confluent. For example, there are two possible CHR derivations for the initial constraint store F [a] [b] Bool, F [a] b2 d F [a] [b] Bool, F [a] b2 d -->_FD (means apply the FD rule) F [a] [b] Bool, F [a] [b] d , b2=[b] --> Inst2 F a b Bool, F [a] [b] d , b_2=[b] (*) Here's the second CHR derivation F [a] [b] Bool, F [a] b2 d -->_Inst2 F a b Bool, F [a] b2 d -->_Imp2 F a b Bool, F [a] [c] d, b2=[c] (**) (*) and (**) are final stores (ie no further CHR are applicable). Unfortunately, they are not logically equivalent (one store says b2=[b] whereas the other store says b2=[c]). To conclude, fullness is a necessary condition to establish confluence of the CHR solver. Confluence is vital to guarantee completeness of type inference. I don't think that fullness is an onerous condition. Martin

Hello,
On 10/19/07, Martin Sulzmann
Simon Peyton-Jones writes:
... Like you, Iavor, I find it very hard to internalise just why (B) and (C) are important. But I believe the paper gives examples of why they are, and Martin is getting good at explaining it. Martin: can you give an example, once more, of the importance of (B) (=fullness)?
Fullness (B) is a necessary condition to guarantee that the constraint solver (aka CHR solver) derived from the type class program is confluent.
Here's an example (taken from the paper).
class F a b c | a->b instance F Int Bool Char instance F a b Bool => F [a] [b] Bool
The FD is not full because the class parameter c is not involved in the FD. We will show now that the CHR solver is not confluent.
Here is the translation to CHRs (see the paper for details)
rule F a b1 c, F a b2 d ==> b1=b2 -- (FD) rule F Int Bool Char <==> True -- (Inst1) rule F Int a b ==> a=Bool -- (Imp1) rule F [a] [b] Bool <==> F a b Bool -- (Inst2) rule F [a] c d ==> c=[b] -- (Imp2)
The above CHRs are not confluent. For example, there are two possible CHR derivations for the initial constraint store F [a] [b] Bool, F [a] b2 d
F [a] [b] Bool, F [a] b2 d -->_FD (means apply the FD rule) F [a] [b] Bool, F [a] [b] d , b2=[b] --> Inst2 F a b Bool, F [a] [b] d , b_2=[b] (*)
Here's the second CHR derivation
F [a] [b] Bool, F [a] b2 d -->_Inst2 F a b Bool, F [a] b2 d -->_Imp2 F a b Bool, F [a] [c] d, b2=[c] (**)
(*) and (**) are final stores (ie no further CHR are applicable). Unfortunately, they are not logically equivalent (one store says b2=[b] whereas the other store says b2=[c]).
But what is wrong with applying the following logical reasoning: Starting with (**): F a b Bool, F [a] [c] d, b2=[c] (by inst2) F a b Bool, F [a] [c] d, b2=[c], F [a] [b] Bool (by FD) F a b Bool, F [a] [c] d, b2=[c], F [a] [b] Bool, [c] = [b] (applying equalities and omitting unnecessary predicates) F [a] [b] Bool, F [a] b2 d (...and then follow your argument to reach (*)...) Alternatively, if we start at (*): F a b Bool, F [a] [b] d , b_2=[b] (by inst2) F a b Bool, F [a] [b] d , b_2=[b], F [a] [b] Bool (applying equalities, rearranging, and omitting unnecessary predicates) F [a] [b] Bool, F [a] b_2 d (... and then follow you reasoning to reach (**) ...) So it would appear that the two sets of predicates are logically equivalent.
To conclude, fullness is a necessary condition to establish confluence of the CHR solver. Confluence is vital to guarantee completeness of type inference.
I don't think that fullness is an onerous condition.
I agree with you that, in practice, many classes probably use "full" FDs. However, these extra conditions make the system more complicated, and we should make clear what exactly are we getting in return for the added complexity. -Iavor

Iavor Diatchki writes:
Hello,
On 10/19/07, Martin Sulzmann
wrote: Simon Peyton-Jones writes:
... Like you, Iavor, I find it very hard to internalise just why (B) and (C) are important. But I believe the paper gives examples of why they are, and Martin is getting good at explaining it. Martin: can you give an example, once more, of the importance of (B) (=fullness)?
Fullness (B) is a necessary condition to guarantee that the constraint solver (aka CHR solver) derived from the type class program is confluent.
Here's an example (taken from the paper).
class F a b c | a->b instance F Int Bool Char instance F a b Bool => F [a] [b] Bool
The FD is not full because the class parameter c is not involved in the FD. We will show now that the CHR solver is not confluent.
Here is the translation to CHRs (see the paper for details)
rule F a b1 c, F a b2 d ==> b1=b2 -- (FD) rule F Int Bool Char <==> True -- (Inst1) rule F Int a b ==> a=Bool -- (Imp1) rule F [a] [b] Bool <==> F a b Bool -- (Inst2) rule F [a] c d ==> c=[b] -- (Imp2)
The above CHRs are not confluent. For example, there are two possible CHR derivations for the initial constraint store F [a] [b] Bool, F [a] b2 d
F [a] [b] Bool, F [a] b2 d -->_FD (means apply the FD rule) F [a] [b] Bool, F [a] [b] d , b2=[b] --> Inst2 F a b Bool, F [a] [b] d , b_2=[b] (*)
Here's the second CHR derivation
F [a] [b] Bool, F [a] b2 d -->_Inst2 F a b Bool, F [a] b2 d -->_Imp2 F a b Bool, F [a] [c] d, b2=[c] (**)
(*) and (**) are final stores (ie no further CHR are applicable). Unfortunately, they are not logically equivalent (one store says b2=[b] whereas the other store says b2=[c]).
But what is wrong with applying the following logical reasoning:
Well, you apply the above CHRs from left to right *and* right to left. In contrast, I apply the CHRs only from left to right.
Starting with (**): F a b Bool, F [a] [c] d, b2=[c] (by inst2) F a b Bool, F [a] [c] d, b2=[c], F [a] [b] Bool (by FD) F a b Bool, F [a] [c] d, b2=[c], F [a] [b] Bool, [c] = [b] (applying equalities and omitting unnecessary predicates) F [a] [b] Bool, F [a] b2 d (...and then follow your argument to reach (*)...)
Alternatively, if we start at (*): F a b Bool, F [a] [b] d , b_2=[b] (by inst2) F a b Bool, F [a] [b] d , b_2=[b], F [a] [b] Bool (applying equalities, rearranging, and omitting unnecessary predicates) F [a] [b] Bool, F [a] b_2 d (... and then follow you reasoning to reach (**) ...)
So it would appear that the two sets of predicates are logically equivalent.
I agree that the two sets F a b Bool, F [a] [b] d , b_2=[b] (*) and F a b Bool, F [a] [c] d, b2=[c] (**) are logically equivalent wrt the above CHRs (see your proof). The problem/challenge we are facing is to come up with a confluent and terminating CHR solver. Why is this useful? (1) We get a deterministic solver (2) We can decide whether two constraints C1 and C2 are equal by solving (a) C1 -->* D1 and (b) C2 -->* D2 and then checking that D1 and D2 are equivalent. The equivalence is a simple syntactic check here. The CHR solving strategy applies rules in a fixed order (from left to right). My example shows that under such a strategy the CHR solver becomes non-confluent for type class programs with non-full FDs. We can show non-confluence by having two derivations starting with the same initial store leading to different final stores. Recall: F [a] [b] Bool, F [a] b2 d -->* F a b Bool, F [a] [b] d , b_2=[b] (*) F [a] [b] Bool, F [a] b2 d -->* F a b Bool, F [a] [c] d, b2=[c] (**) I said
(*) and (**) are final stores (ie no further CHR are applicable). Unfortunately, they are not logically equivalent (one store says b2=[b] whereas the other store says b2=[c]).
More precisely, I meant here that (*) and (**) are not logically equivalent *not* taking into account the above CHRs. This means that (*) and (**) are different (syntactically).
To conclude, fullness is a necessary condition to establish confluence of the CHR solver. Confluence is vital to guarantee completeness of type inference.
I don't think that fullness is an onerous condition.
I agree with you that, in practice, many classes probably use "full" FDs. However, these extra conditions make the system more complicated, and we should make clear what exactly are we getting in return for the added complexity.
You can get rid of non-full FDs (see the paper). BTW, type functions are full by construction. Martin

Hi All, Here are my responses to the recent messages, starting with some summary comments: - I agree with Martin that the condition I posted a few days ago is equivalent to the "*refined* weak coverage condition" in your paper. The "refined" tag here is important---I missed it the first time and thought he was referring to the WCC at the start of Section 6.1, and I would strongly recommend against using that (unrefined) version. I confess that I haven't read the paper carefully enough to understand why you included that first WCC at all; I don't see any reason why you would want to use that particular condition in a practical implementation, and I don't see any advantages to it from a theoretical perspective either. (Maybe you just used it as a stepping stone to help motivate the refined WCC?) - I believe that the refined WCC (or the CC or even the original WCC if you really want) are sufficient to ensure soundness. I don't agree that the (B) restriction to "full" FDs is necessary. - I think it is dangerous to tie up soundness with questions about termination. The former is a semantic property while the latter has to do with computability. Or, from a different perspective, soundness is essential, while termination/decidability is "only" desirable. I know that others have thought more carefully than I have about conditions on the form of class and instance declarations that will guarantee decidability and I don't have much to say on that topic in what follows. However, I will suggest that one should start by ensuring soundness and then, *separately*, consider termination. (Of course, it's perfectly ok---just not essential---if conditions used to ensure soundness can also be used to guarantee termination.) Stop reading now unless you're really interested in the details! I'm going to start by explaining some key (but hopefully unsurprising) ideas before I respond in detail to the example that Martin posted. Interpreting Class, Instance, and Dependency Decls: --------------------------------------------------- I'll start by observing that Haskell's class, instance, and functional dependency declarations all map directly to formulas in standard predicate logic. The correspondence is straightforward, so I'll just illustrate it with some examples: Declaration Formula ----------- ------- class Eq a => Ord a forall a. Ord a => Eq a instance Eq a => Eq [a] forall a. Eq a => Eq [a] class C a b | a -> b forall a, b. C a b /\ C a c => b = c This correspondence between declarations and formulas seems to be very similar to what you did in the paper using CHRs. I haven't read the paper carefully enough to know what extra mileage and/or problems you get by using CHRs. To me, predicate logic seems more natural, but I don't think that matters here---I just wanted to acknowledge that essentially the same idea is in your paper. In the following, assuming some given program P, I'll write I(P) for the conjunction of all the instance declaration formulas in P; C(P) for the conjunction of all the class declaration formulas; and F(P) fo the conjunction of all the dependency declaration formulas. Semantics and Soundness: ------------------------ We'll adopt a simple and standard semantics in which each n-parameter type class is described by an n-place relation, that is, a set of n-tuples of types. Of course, in the most common, one parameter case, this is equivalent to viewing the class as a set of types. Each tuple in this relation is referred to as an "instance" of the class, and the predicate syntax C t1 ... tn is just a notation for asserting that the tuple (t1, ..., tn) is an instance of C. We can assign a meaning to each of the classes in a given program P by using the smallest model of the instance declaration formulas, I(P). (Existence of such a model is guaranteed; it is the union of an increasing chain of approximations.) What about the superclass properties in C(P)? We expect to use information from C(P) to simplify contexts during type inference. For example, if we're using the standard Haskell prelude, then we know that we can simplify (Eq a, Ord a) to (Ord a). But how do we know this is sound? Haskell answers this by imposing restrictions (third bullet in Section 4.3.2 of the report) on the use of instance declarations to ensure that our (least) model of I(P) is also a model of C(P). This guarantees, for example, that every type in the semantics of "Ord" will, as required, also be included in the semantics of "Eq". That should all seem pretty standard, but I've included it here to make the point that we need to something very similar when we introduce functional dependencies. Specifically, we want to be able to use properties from F(P) to simplify/improve contexts during type inference, so we have to ensure that this is sound. If we follow the strategy that was used to ensure soundness of C(P) in Haskell, then we have to come up with a restriction on instance declarations to ensure that our model of I(P) is also a model of F(P). I claim that any of the three conditions we've talked about previously---using your terminology, the CC, WCC or refined WCC---are sufficient to ensure this soundness property. I haven't completed a formal proof, but I think it should be a pretty straightforward exercise in predicate logic. If a program fails to meet the criteria that ensure soundness of a superclass declaration, the Haskell compiler should reject that code. Exactly the same thing should happen if the input program doesn't satisfy the functional dependencies. Further restrictions are needed to ensure that there is a well-defined semantics (i.e., dictionary/evidence) for each type class instance; this is where overlapping instances complicate the picture, for example. However, for what we're doing here, it's sufficient to focus on the semantics of classes (i.e., on which types are instances of which classes) and to defer the semantics of overloaded operators to a separate discussion. Responding to Martin's Example: -------------------------------
Here's an example (taken from the paper).
class F a b c | a->b instance F Int Bool Char instance F a b Bool => F [a] [b] Bool
This is an unfortunate example because, in fact, the second instance declaration doesn't contribute anything to the class F! Using the semantics I outlined above, the class F is just {(Int, Bool, Char)}, with only the one instance that comes from the first instance declaration. This is unfortunate because, in the presence of functional dependencies---or, in general, any form of "improvement"---you have to take account of satisfiability. In the example here, any predicate of the form F a b Bool is unsatisfiable, and so any time we attempt to transform an inferred context containing a predicate of this form into some "simplified" version, then there is a sense in which all we are doing is proving that False is equivalent to False (or that anything follows from a contradiction). In what follows, let's refer to the program above as Q1. However, to explore a case where the predicates of interest are satisfiable, let's also consider an extended version, Q2, that adds one extra instance declaration: instance F Char Int Bool Now the semantics of Q2 looks something like: { (Int, Bool, Char), (Char, Int, Bool), ([Char], [Int], Bool), ([[Char]], [[Int]], Bool), ([[[Char]]], [[[Int]]], Bool), ([[[[Char]]]], [[[[Int]]]], Bool), ... }
The FD is not full because the class parameter c is not involved in the FD. We will show now that the CHR solver is not confluent.
Although it's just one example, it is worth pointing out that both Q1 and Q2 lead to a semantics for F that is consistent/sound with respect to the declared functional dependency. So far as our semantics is concerned, the fact that the dependency is not full is not significant. I don't dispute your claim that fullness is necessary to establish confluence with CHRs. But that is a consequence of the formal proof/rewrite system that you're using, and not of the underlying semantics.
Here is the translation to CHRs (see the paper for details)
rule F a b1 c, F a b2 d ==> b1=b2 -- (FD) rule F Int Bool Char <==> True -- (Inst1) rule F Int a b ==> a=Bool -- (Imp1) rule F [a] [b] Bool <==> F a b Bool -- (Inst2) rule F [a] c d ==> c=[b] -- (Imp2)
I'm a little surprised that you need Imp1 and Imp2 here. I didn't include analogs of these when I gave a predicate logic rendering for dependency declarations above because they can be obtained as derived properties. Imp1, for example: Suppose F Int a b. From Inst1, we know that F Int Bool Char. Hence by FD, we know that a = Bool. My best guess is that you included the Imp1 and Imp2 rules because, although the "<==>" symbol suggests the possibility of bidirectional rewriting, you actually want to view the above as oriented rewrite rules? From my perspective, however, the FD rule is the definition of the functional dependency on F, and everything that you want to know about this dependency can be derived from it ...
The above CHRs are not confluent. For example, there are two possible CHR derivations for the initial constraint store F [a] [b] Bool, F [a] b2 d
F [a] [b] Bool, F [a] b2 d -->_FD (means apply the FD rule) F [a] [b] Bool, F [a] [b] d , b2=[b] --> Inst2 F a b Bool, F [a] [b] d , b_2=[b] (*)
Here's the second CHR derivation
F [a] [b] Bool, F [a] b2 d -->_Inst2 F a b Bool, F [a] b2 d -->_Imp2 F a b Bool, F [a] [c] d, b2=[c] (**)
(*) and (**) are final stores (ie no further CHR are applicable). Unfortunately, they are not logically equivalent (one store says b2=[b] whereas the other store says b2=[c]).
I think it's a mistake to expect these formulas to be logically equivalent, at least in the strong syntactic sense that you seem to be assuming here. As I've mentioned previously, if we're going to use improvement, then we have to take account of satisfiability. In the context of Q1, this is an example in which each of the two derivations above are essentially equivalent to False --> False. But, in the general case, if we comparing (*) with (**), then the main difference seems to be that the latter involves the use of a "new" variable, c, in place of the variable b that appears in the latter. However, (*) includes F [a] [b] d and (**) includes F [a] [c] d, and so we can deduce from the functional dependency (i.e., your FD rule) that [b]=[c], and hence b=c. If we're allowed to inline that equality (i.e., use it to replace all occurences of c with b), then (**) becomes identical to (*).
To conclude, fullness is a necessary condition to establish confluence of the CHR solver.
I think that the standard notion of confluence is the wrong property to be looking at here. Or, to put it another way, I don't think it is reasonable to expect confluence in the presence of improvement. That said, it might still be appropriate and useful to formulate a notion of "confluence modulo satisfiability" in which the equalities involved in the definition of confluence are weakened to "equalities modulo satisfiability" ...
Confluence is vital to guarantee completeness of type inference.
I disagree. I established the completeness of type inference in the presence of improvement (without requiring any kind of confluence-like property) as part of my work in 1994 on "Simplifying and Improving Qualified Types." (See Section 4 of the technical report version, for example, which includes details---like the proofs---that I had to omit from the later FPCA '95 version for lack of space.) Functional dependencies are a special case of improvement, so I'd dispute the claim in your paper that you are the first to formalize and prove soundness, completeness and decidability. (However, I'm sympathetic to the fact that you missed this, because my results on improvement were published six years before my formulation of functional dependencies; if you were only looking at later publications, you wouldn't have found anything.) It's worth pointing out that, to achieve my completeness result, I had to do exactly what I am suggesting that you should do now: modify your formalism to account for satisfiability. In my work, for example, that meant introducing a "satisfiability ordering" on type schemes and replacing the conventional notion of principal types with a concept of "principal satisfiable types". Let me take this opportunity to reflect on my understanding of your main contributions and the relationship to my work. In my general, formal system, simplification and improvement rules can be used at any point during type inference, but there are no language features---such as explicit type signatures or instance declarations---that *require* the use of such rules. As a result, decidability, in the general case, is easy to establish, but it is arguably not very useful. In my paper, I suggested that, in practice, for any specific application of improvement, a language designer would likely want to adopt a more algorithmic approach that invokes suitably defined "improvement" and "simplification" functions at specific points during type inference (coupled, for example, with generalization). With this approach, soundness, completeness, and decidability of type inference become dependent on the soundness and termination properties of the improvement and simplification functions. This, I think, is the most significant technical issue that you are addressing in your paper. In concrete terms, I think this comes down to a question of whether the entailment relation C1 ||- C2 between contexts C1 and C2 is decidable. I hadn't heard of the "Paterson Conditions" before I saw the reference on the Trac page (sorry Ross!), but it sounds like they are sufficient to guarantee good theoretical properties but also clear and flexible enough to be useful in practice.
I don't think that fullness is an onerous condition.
Well that's probably true, but requiring fullness means that there's one extra concept that programmers have to grapple with, and one extra hurdle that they have to overcome in those situations where a non-full FD seems to provide a more natural solution. There are at least a couple of ways to avoid the non-full FD in your example program. For example, we could replace every use of (F a b c) with a pair of constraints (F1 a b, F2 a c) where the classes F1 and F2 are as follows: class F1 a b | a -> b class F2 a c This is an example of what's called normalization in relational databases. Because the type class relations that we use in practice are fairly simple (just a few columns) and because normalized relations have some structural benefits, I'd suspect that the use of normalized type class relations, like the F1+F2 pair above, is usually a hallmark of better design than the non-normalized version (like the original version with just F). Nevertheless, unless there are strong technical reasons to require one version over the other, I'd prefer to leave choices like this open so that individual programmers can make the most appropriate call for a given application. .... I'm sorry I haven't given you this kind of feedback previously. Simon had invited me to take a look at the paper on at least one previous occassion, but it had not found its way to the front of my reading queue. Even now, I've only skimmed the paper (e.g., I started with the Trac web page, figured out that I needed to know what a "full" FD might be, and then used a text search of the pdf to locate the definition.) But I can see that there is a lot more in the paper beyond the aspects we're discussing here, particularly with respect to criteria for establishing termination, so I apologize again if I've overlooked other relevant sections. I hope my comments are still useful! All the best, Mark

Mark P Jones writes:
Hi All,
Here are my responses to the recent messages, starting with some summary comments:
- I agree with Martin that the condition I posted a few days ago is equivalent to the "*refined* weak coverage condition" in your paper. The "refined" tag here is important---I missed it the first time and thought he was referring to the WCC at the start of Section 6.1, and I would strongly recommend against using that (unrefined) version. I confess that I haven't read the paper carefully enough to understand why you included that first WCC at all; I don't see any reason why you would want to use that particular condition in a practical implementation, and I don't see any advantages to it from a theoretical perspective either. (Maybe you just used it as a stepping stone to help motivate the refined WCC?)
It's a stepping stone but WCC on it's own is useful to recover termination (see the "zip" example in the paper).
- I believe that the refined WCC (or the CC or even the original WCC if you really want) are sufficient to ensure soundness. I don't agree that the (B) restriction to "full" FDs is necessary.
- I think it is dangerous to tie up soundness with questions about termination. The former is a semantic property while the latter has to do with computability. Or, from a different perspective, soundness is essential, while termination/decidability is "only" desirable. I know that others have thought more carefully than I have about conditions on the form of class and instance declarations that will guarantee decidability and I don't have much to say on that topic in what follows. However, I will suggest that one should start by ensuring soundness and then, *separately*, consider termination. (Of course, it's perfectly ok---just not essential---if conditions used to ensure soundness can also be used to guarantee termination.)
Well, decidability and completeness matters if we care about automatic type inference. Given some input program we'd like that to have a *decidable* algorithm which computes a type for every well-typed program (*soundness*) and yields failure if the program is ill-typed (*completeness*). In "Simplifying and Improving Qualified Types." you give indeed a sound, decidable and complete type inference algorithm *if* the proof theory among constraints in some theory of qualified types is sound, decidable and complete. My main concern is to establish sufficient conditions such that the proof theory is sound, decidable and complete. That is, there's a decidable algorithm which says yes if QT |= C1 -> C2 and otherwise no where QT |= C1 -> C2 means that in the qualified theory QT constraint C1 entails constraint C2. I think that our works are fairly complementary. Or are you saying you've already given such an algorithm (which I indeed must have missed). I believe that you are offended by my statement that
Confluence is vital to guarantee completeness of type inference.
Let me clarify. Confluence is one of the suffient conditions to guarantee completeness of CHR-based type inference. Apologies for any confusion caused by my previous statement. There are clearly other approaches possible to obtain a sound, decidable and complete proof theory (and therefore to obtain sound, decidable and complete type inference). But I yet need to see concrete results which match the CHR-based approach/results described in: Understanding Functional Dependencies via Constraint Handling Rules Martin Sulzmann, Gregory J. Duck, Simon Peyton-Jones and Peter J. Stuckey In Journal of Functional Programming, 2007 Martin
Stop reading now unless you're really interested in the details! I'm going to start by explaining some key (but hopefully unsurprising) ideas before I respond in detail to the example that Martin posted.
Interpreting Class, Instance, and Dependency Decls: --------------------------------------------------- I'll start by observing that Haskell's class, instance, and functional dependency declarations all map directly to formulas in standard predicate logic. The correspondence is straightforward, so I'll just illustrate it with some examples:
Declaration Formula ----------- ------- class Eq a => Ord a forall a. Ord a => Eq a instance Eq a => Eq [a] forall a. Eq a => Eq [a] class C a b | a -> b forall a, b. C a b /\ C a c => b = c
This correspondence between declarations and formulas seems to be very similar to what you did in the paper using CHRs. I haven't read the paper carefully enough to know what extra mileage and/or problems you get by using CHRs. To me, predicate logic seems more natural, but I don't think that matters here---I just wanted to acknowledge that essentially the same idea is in your paper.
In the following, assuming some given program P, I'll write I(P) for the conjunction of all the instance declaration formulas in P; C(P) for the conjunction of all the class declaration formulas; and F(P) fo the conjunction of all the dependency declaration formulas.
Semantics and Soundness: ------------------------ We'll adopt a simple and standard semantics in which each n-parameter type class is described by an n-place relation, that is, a set of n-tuples of types. Of course, in the most common, one parameter case, this is equivalent to viewing the class as a set of types. Each tuple in this relation is referred to as an "instance" of the class, and the predicate syntax C t1 ... tn is just a notation for asserting that the tuple (t1, ..., tn) is an instance of C.
We can assign a meaning to each of the classes in a given program P by using the smallest model of the instance declaration formulas, I(P). (Existence of such a model is guaranteed; it is the union of an increasing chain of approximations.)
What about the superclass properties in C(P)? We expect to use information from C(P) to simplify contexts during type inference. For example, if we're using the standard Haskell prelude, then we know that we can simplify (Eq a, Ord a) to (Ord a). But how do we know this is sound? Haskell answers this by imposing restrictions (third bullet in Section 4.3.2 of the report) on the use of instance declarations to ensure that our (least) model of I(P) is also a model of C(P). This guarantees, for example, that every type in the semantics of "Ord" will, as required, also be included in the semantics of "Eq".
That should all seem pretty standard, but I've included it here to make the point that we need to something very similar when we introduce functional dependencies. Specifically, we want to be able to use properties from F(P) to simplify/improve contexts during type inference, so we have to ensure that this is sound. If we follow the strategy that was used to ensure soundness of C(P) in Haskell, then we have to come up with a restriction on instance declarations to ensure that our model of I(P) is also a model of F(P). I claim that any of the three conditions we've talked about previously---using your terminology, the CC, WCC or refined WCC---are sufficient to ensure this soundness property. I haven't completed a formal proof, but I think it should be a pretty straightforward exercise in predicate logic.
If a program fails to meet the criteria that ensure soundness of a superclass declaration, the Haskell compiler should reject that code. Exactly the same thing should happen if the input program doesn't satisfy the functional dependencies. Further restrictions are needed to ensure that there is a well-defined semantics (i.e., dictionary/evidence) for each type class instance; this is where overlapping instances complicate the picture, for example. However, for what we're doing here, it's sufficient to focus on the semantics of classes (i.e., on which types are instances of which classes) and to defer the semantics of overloaded operators to a separate discussion.
Responding to Martin's Example: -------------------------------
Here's an example (taken from the paper).
class F a b c | a->b instance F Int Bool Char instance F a b Bool => F [a] [b] Bool
This is an unfortunate example because, in fact, the second instance declaration doesn't contribute anything to the class F! Using the semantics I outlined above, the class F is just {(Int, Bool, Char)}, with only the one instance that comes from the first instance declaration. This is unfortunate because, in the presence of functional dependencies---or, in general, any form of "improvement"---you have to take account of satisfiability. In the example here, any predicate of the form F a b Bool is unsatisfiable, and so any time we attempt to transform an inferred context containing a predicate of this form into some "simplified" version, then there is a sense in which all we are doing is proving that False is equivalent to False (or that anything follows from a contradiction).
In what follows, let's refer to the program above as Q1. However, to explore a case where the predicates of interest are satisfiable, let's also consider an extended version, Q2, that adds one extra instance declaration:
instance F Char Int Bool
Now the semantics of Q2 looks something like:
{ (Int, Bool, Char), (Char, Int, Bool), ([Char], [Int], Bool), ([[Char]], [[Int]], Bool), ([[[Char]]], [[[Int]]], Bool), ([[[[Char]]]], [[[[Int]]]], Bool), ... }
The FD is not full because the class parameter c is not involved in the FD. We will show now that the CHR solver is not confluent.
Although it's just one example, it is worth pointing out that both Q1 and Q2 lead to a semantics for F that is consistent/sound with respect to the declared functional dependency. So far as our semantics is concerned, the fact that the dependency is not full is not significant.
I don't dispute your claim that fullness is necessary to establish confluence with CHRs. But that is a consequence of the formal proof/rewrite system that you're using, and not of the underlying semantics.
Here is the translation to CHRs (see the paper for details)
rule F a b1 c, F a b2 d ==> b1=b2 -- (FD) rule F Int Bool Char <==> True -- (Inst1) rule F Int a b ==> a=Bool -- (Imp1) rule F [a] [b] Bool <==> F a b Bool -- (Inst2) rule F [a] c d ==> c=[b] -- (Imp2)
I'm a little surprised that you need Imp1 and Imp2 here. I didn't include analogs of these when I gave a predicate logic rendering for dependency declarations above because they can be obtained as derived properties. Imp1, for example:
Suppose F Int a b. From Inst1, we know that F Int Bool Char. Hence by FD, we know that a = Bool.
My best guess is that you included the Imp1 and Imp2 rules because, although the "<==>" symbol suggests the possibility of bidirectional rewriting, you actually want to view the above as oriented rewrite rules? From my perspective, however, the FD rule is the definition of the functional dependency on F, and everything that you want to know about this dependency can be derived from it ...
The above CHRs are not confluent. For example, there are two possible CHR derivations for the initial constraint store F [a] [b] Bool, F [a] b2 d
F [a] [b] Bool, F [a] b2 d -->_FD (means apply the FD rule) F [a] [b] Bool, F [a] [b] d , b2=[b] --> Inst2 F a b Bool, F [a] [b] d , b_2=[b] (*)
Here's the second CHR derivation
F [a] [b] Bool, F [a] b2 d -->_Inst2 F a b Bool, F [a] b2 d -->_Imp2 F a b Bool, F [a] [c] d, b2=[c] (**)
(*) and (**) are final stores (ie no further CHR are applicable). Unfortunately, they are not logically equivalent (one store says b2=[b] whereas the other store says b2=[c]).
I think it's a mistake to expect these formulas to be logically equivalent, at least in the strong syntactic sense that you seem to be assuming here. As I've mentioned previously, if we're going to use improvement, then we have to take account of satisfiability.
In the context of Q1, this is an example in which each of the two derivations above are essentially equivalent to False --> False. But, in the general case, if we comparing (*) with (**), then the main difference seems to be that the latter involves the use of a "new" variable, c, in place of the variable b that appears in the latter. However, (*) includes F [a] [b] d and (**) includes F [a] [c] d, and so we can deduce from the functional dependency (i.e., your FD rule) that [b]=[c], and hence b=c. If we're allowed to inline that equality (i.e., use it to replace all occurences of c with b), then (**) becomes identical to (*).
To conclude, fullness is a necessary condition to establish confluence of the CHR solver.
I think that the standard notion of confluence is the wrong property to be looking at here. Or, to put it another way, I don't think it is reasonable to expect confluence in the presence of improvement. That said, it might still be appropriate and useful to formulate a notion of "confluence modulo satisfiability" in which the equalities involved in the definition of confluence are weakened to "equalities modulo satisfiability" ...
Confluence is vital to guarantee completeness of type inference.
I disagree. I established the completeness of type inference in the presence of improvement (without requiring any kind of confluence-like property) as part of my work in 1994 on "Simplifying and Improving Qualified Types." (See Section 4 of the technical report version, for example, which includes details---like the proofs---that I had to omit from the later FPCA '95 version for lack of space.) Functional dependencies are a special case of improvement, so I'd dispute the claim in your paper that you are the first to formalize and prove soundness, completeness and decidability. (However, I'm sympathetic to the fact that you missed this, because my results on improvement were published six years before my formulation of functional dependencies; if you were only looking at later publications, you wouldn't have found anything.)
It's worth pointing out that, to achieve my completeness result, I had to do exactly what I am suggesting that you should do now: modify your formalism to account for satisfiability. In my work, for example, that meant introducing a "satisfiability ordering" on type schemes and replacing the conventional notion of principal types with a concept of "principal satisfiable types".
Let me take this opportunity to reflect on my understanding of your main contributions and the relationship to my work. In my general, formal system, simplification and improvement rules can be used at any point during type inference, but there are no language features---such as explicit type signatures or instance declarations---that *require* the use of such rules. As a result, decidability, in the general case, is easy to establish, but it is arguably not very useful.
In my paper, I suggested that, in practice, for any specific application of improvement, a language designer would likely want to adopt a more algorithmic approach that invokes suitably defined "improvement" and "simplification" functions at specific points during type inference (coupled, for example, with generalization). With this approach, soundness, completeness, and decidability of type inference become dependent on the soundness and termination properties of the improvement and simplification functions. This, I think, is the most significant technical issue that you are addressing in your paper. In concrete terms, I think this comes down to a question of whether the entailment relation C1 ||- C2 between contexts C1 and C2 is decidable. I hadn't heard of the "Paterson Conditions" before I saw the reference on the Trac page (sorry Ross!), but it sounds like they are sufficient to guarantee good theoretical properties but also clear and flexible enough to be useful in practice.
I don't think that fullness is an onerous condition.
Well that's probably true, but requiring fullness means that there's one extra concept that programmers have to grapple with, and one extra hurdle that they have to overcome in those situations where a non-full FD seems to provide a more natural solution.
There are at least a couple of ways to avoid the non-full FD in your example program. For example, we could replace every use of (F a b c) with a pair of constraints (F1 a b, F2 a c) where the classes F1 and F2 are as follows:
class F1 a b | a -> b class F2 a c
This is an example of what's called normalization in relational databases. Because the type class relations that we use in practice are fairly simple (just a few columns) and because normalized relations have some structural benefits, I'd suspect that the use of normalized type class relations, like the F1+F2 pair above, is usually a hallmark of better design than the non-normalized version (like the original version with just F). Nevertheless, unless there are strong technical reasons to require one version over the other, I'd prefer to leave choices like this open so that individual programmers can make the most appropriate call for a given application.
....
I'm sorry I haven't given you this kind of feedback previously. Simon had invited me to take a look at the paper on at least one previous occassion, but it had not found its way to the front of my reading queue. Even now, I've only skimmed the paper (e.g., I started with the Trac web page, figured out that I needed to know what a "full" FD might be, and then used a text search of the pdf to locate the definition.) But I can see that there is a lot more in the paper beyond the aspects we're discussing here, particularly with respect to criteria for establishing termination, so I apologize again if I've overlooked other relevant sections.
I hope my comments are still useful!
All the best, Mark

Hi Mark,
Declaration Formula ----------- ------- class Eq a => Ord a forall a. Ord a => Eq a instance Eq a => Eq [a] forall a. Eq a => Eq [a] class C a b | a -> b forall a, b. C a b /\ C a c => b = c
This correspondence between declarations and formulas seems to be very similar to what you did in the paper using CHRs. I haven't read the paper carefully enough to know what extra mileage and/or problems you get by using CHRs. To me, predicate logic seems more natural, but I don't think that matters here---I just wanted to acknowledge that essentially the same idea is in your paper.
i wondered about that "extra mileage" question when i first encountered chrs and the fd-to-chr translations - here's my take on it: while chr started out with a predicate logic semantics, it helps to think of them as representing a subset of linear logic instead (which is where recent chr semantics are headed). other relevant aspects are: committed choice language (no reliance on backtracking), no implicit unification (left-hand sides of rules are matched, unification has to be called explicitly in right-hand sides, *after* committing to a rule). what is helpful about using chr, then, is that they are closer to implementations of functional dependencies, making it easier to expose/discuss some general issues relevant to implementations but not to abstract semantics (such as incomplete and changing knowledge due to separate compilation, or some implementations risking to lose information), without having to wade through all other implementation-level details. in other words, chr expose a few more details, so more things can go wrong; which is useful, because one can discuss how to avoid these wrongs, providing more concrete guidelines for implementations. they also avoid exposing details not (currently) relevant to the application, as a mapping to prolog would.
I don't dispute your claim that fullness is necessary to establish confluence with CHRs. But that is a consequence of the formal proof/rewrite system that you're using, and not of the underlying semantics.
i tried to argue that case (the confluence issues are due to the translation, not inherent in functional dependencies) last year, even suggested an alternate translation of fd to chr. something like this (the 'TC a b' should be 'TC a1..an', i think, and the notation is meant to correspond to figure 2 in the jfp paper): http://www.haskell.org//pipermail/haskell-prime/2006-March/000893.html simplifying slightly, the alternate translation uses an embedding of classical into linear logic, so knowledge about fds never decreases as long as it remains consistent, whereas the jfp paper translation works directly in a linear logic (<==> is like linear implication -o, consuming its pre-conditions), so applying inference steps in the "wrong" order can lose knowledge about fds, leading to loss of confluence (unless additional restrictions are introduced). if i recall correctly, the paper also relied on confluence as one possible means to ensure termination, meaning that results about termination or results relying on termination were to some extent tied up with the extra conditions used to ensure confluence. i believe that fixing the translation to chr would get rid of the confluence issues discussed in the paper alltogether, so it could have focussed on the remaining issues, like termination, without distraction. frustratingly, while the paper was still a draft at the time, the discussion came too late to have an impact on the version published in jfp. hope this helps, claus
participants (5)
-
Claus Reinke
-
Iavor Diatchki
-
Mark P Jones
-
Martin Sulzmann
-
Simon Peyton-Jones