
hi, On Fri, 26 Mar 2004, Robert Will wrote:
Honest reply: I don't fully understand FunDeps and I don't want to learn
Well, in fact I couldn't resist the temptation to look it at, only to find that they are really simpler than my earlier information suggested. I also made two small observations (a) that non-circular FunDeps with a single variable on the left side, can be one-to-one emulated via Constructor classes, and (b) parametric type classes correspond exactly to FunDeps with only one constraint. (Both of the special cases occur rather often.) Anyway it consolidates my opinion regarding the Abstract Collections and FunDeps: no need to make a redesign. (Full text at the bottom.) The only thing that bothers me at the moment, is that I couldn't find the "many other applications of FunDeps" that are mentioned but not given in Jones' paper. Perhaps one should make a list. On Fri, 26 Mar 2004, Dylan Thurston wrote:
Also, they're vitally necessary for certain algorithms and for speed (since not all ways of making containers can contain all items).
If you mean restricting some Containers to certain element types (Patricia trees and so on), this is perfectly possible without FunDeps. If you don't mean that, I don't understand it (even with my new knowledge of FunDeps ;-). Robert -------------------- 250 lines In section 0, I explain FunDeps for people like me. Section 1 shows that a certain kind of FunDeps can always be expressed with Constructor classes. Section 2 compares those approaches and Section 3 makes the relationship between FundDeps and Parametric Type Classes more precise. (Although I don't consider this relevant in practice.) Finally I draw some conclusions for the Abstract Collections and further research. I'm referring to FunDeps as described in Mark P. Jones: "Type Classes with Functional Dependencies" Proc. ESOP 2000, Berlin, Springer-Verlag LNCS 1782. People wondering about why I need to formulate things different to consider them correct, may read about mental universes in Stefan Kahrs, 1999: A formalist's perspective on mathematics http://www.cs.ukc.ac.uk/people/staff/smk/manifesto.ps 0. FunDeps are restrictions on the set of possible instance declarations. We consider Jones' way to define a class for collection types: class Collects e ce where -- empty :: ce -- ambiguous single :: e -> ce -- (<+>) :: ce -> ce -> ce -- ambiguous add :: ce -> e -> ce first :: ce -> e And two examples: a) Intuitively the following binding should be ill-typed, but instead it gets a type:
g :: (Collects String ce, Collects Int ce) => ce g = (single "Jones") `add` 1
b) And the following should have type "Int" instead:
f :: (Collects Int ce, Collects e ce) => e f = first (single 1)
Indeed, if we had some multi-collection type "MultiColl" with
instance Collects String MultiColl where ... instance Collects Int MultiColl where ...
Then the above types could be specialised to g :: MultiColl f :: String Given that MultiColl is easily implementable in Haskell (as a tuple of collections of String, Int, and possibly others; or as a Collections of a sum type...), we must admit that the above polymorphic types are indeed correct. But what has made us assume the contrary? Well, obviously we thought that there would never be any type that could have both instances of Collects. More generally, we would assume that any type 't' will only have one single instance of Collects: one single element type, for each collection type. If the compiler would know that, too, he could simplify the above type expressions to yield an error for 'g' (since no 'ce' can comoply to both constraints) and "Int" for 'f' (since 'b' and 'Int' as element types of the same collection type must be the same). The relation to relational data bases: A data base query like "give me the name and address of the patient with social security number (SSN) such_and_such," is ambigous in the general case: there might be more than one patient with SSN such_and_such. The solution is to declare a functional dependency: "SSN -> (name, address)" which is just a declaration that for each SSN there is at most one 'name' and 'address' in all entries where that SSN appears. (There might be different such entries, e.g. if the SSN is connected with multiple insurance cases.) In our problem, we can do just the same: In a class declaration "class C t" (where 't' are the type variables of the class) a Functional Dependency is written "a -> b" (where 'a' and 'b' are subsets of 't'). It's semantics (modulo polymorphic instances) is: A FunDep "a -> b" restricts the set of legal instance declarations such that for every pair of instances where the variables 'a' are replaced by the same types, 'b' must also be the same types. As a consequence, if 'a' and 'b' are the only type variables of a class, this means that there can be at most one instance declaration for any substitution for 'a'. Since instances can also contain type variables (they can be polymorphic), the characterisation "same substitution of types" is not general enough. But we can "lift" the above definition if we consider any polymorphic instance declaration to represent the (possibly infinite) set of all its non-polymorphic substitution instances. Then, for example the declaration
instance Collects x [y]
Stands in place for "instance Collects Int [Int]", "instance Collects Int [Bool]" and so on. This clearly violates the restriction. In section 6.1. of his paper Jones formulates the rectriction using unification, so that polymorphic instances are covered, and in section 6.2. he derives simplification rules for the constraints, which makes disappear ambiguous variables (because they are unified with others). 1. Left-Single FunDeps are Syntactic Sugar for Constructor Classes Haskell distinguishes between types (like "Int", "a -> b") and type constructors (like [], (->)). Unlike in the value-world, where functions and data constructors represent "normal" values, type constructors are not normal types: they can be used as parameters to other type constructors, but in a legal type, all constructors must be completely applied. AFAICS, in some earlier version of Haskell the parameter(s) of a type class could only represent a type, no partially- or unapplied type constructors. Type classes that work with the latter later came as an extension and are called "Constructor Classes". For simple FunDeps of the form "a -> b" where 'a' and 'b' are single variables, not set of variables, we can do a simple transformation: in the declaration "class C a b ... where" we replace 'a' with 't' (the variable stands for an unary type constructor) and in the class context (the part before the "=>") and the body (the declaration of the member functions) we replace each 'a' with 't b'. Then we can leave out the FunDep and still have exactly the same semantics. (Note that having non-variable constructors in the class context is a language extension, even a dangerous one, but this is not required if 'a' doesn't appear in the context, or if we can change the superclass in the same way.) This transformation can be done to eliminate all FunDeps that (1) have only one variable on the left hand side (but possibly more on the right hand side), and that (2) are not cyclic, i.e., there's no dependency chain like a -> b, b -> c, c -> .. a. If we think of constructor classes as the "more basic" construct (which is reasonable, since their semantic rules are not much different from ordinary type classes), we can thus consider left-single non-cyclic FunDeps as syntactic sugar for the use of constructor classes. 2. Comparison Here is a small example that compares the FunDep and the Constructor Class version of a type:
From the Abstract Collections: (read (<:) as 'add_first', formerly known as 'cons') (1<:) :: ( Sequence seq Int ) => seq Int -> seq Int
using FunDeps this would be:
(1<:) :: ( Sequence seq Int ) => seq -> seq
As you can see the first version is more informative, saying explicitly "seq Int". Of course, in the FunDep version we could rename 'seq' to 'seq_Int' to make it more readable. But still the constructor version provides us automatically with that structuring: every collection type consists of a collection constructor and an element type, just like if we would use concrete types: "seq Int" vs "Seq Int" (or "(WeightBalanced LeafTree) Int", which would be "WeightBalanced (LeafTree Int)", with nullary collection types, by the way -- AFAICS without having tried, Dessy's building set approach would also work with nullary Collection classes with just this change in parentheses). The following example reveals a key point ('apply' being the function formerly known as 'map'):
apply :: ( Collection coll a, Collection coll b ) => (a -> b) -> coll a -> coll b
with FunDeps:
apply :: ( Collects a ca, Collection b cb ) => (a -> b) -> ca -> cb
The first type expresses that the type constructor of 'apply''s argument and result must be the same, while the nullary version can't express that. (This is not tooooo important in the abstract collections, perhaps even a bit questionable, but (a) it helps us to avoid intermediate ambigous types, (even with FundDeps 'cb' wouldn't be unified with anything more concrete) and (b) it also helps make simpler implementations, since 'apply' can be lifted to use the representation of higher-order type constructors, such as 'WeightBalanced'.) Short summary: Constructor classes give slightly better documentation and are slightly more expressive. 3. Parametric Type Classes are Normalised FunDeps Here is another small observation: in relational data bases we want to transform our tables in such a way, that are as little as possible dependencies (since they incur redundancy). Thus we bring them into a normal form, with only one functional constraint of the form "a -> b" (here 'a' and 'b' denote sets (of columns) again). The 'a' part is called the "key" of an entry in the data base. Classes with only one FunDep are necessarily in normal form. I don't want to consider transforming class declarations to normal form (no idea what that would mean...), just this: if we use a parametric type class of the form "class a \elem C b where" ('a' and 'b' sets of variables) then this corresponds just to a FunDep in normal form. Non-normal-form FunDeps can't be expressed with Parametric Type Classes. 4. Conclusions Two conclusions for further research: 1. Everytime one advocates FunDeps, one should mention whether the application needs left-multiple FunDeps, or whether left-single FunDeps suffice. 2. Everytime one advocates left-single FunDeps, one should compare the solution with Constructor Classes and one should consider all the different possibilities to encode the FunDeps with constructor classes. Which one is the most simple, most intuitive? Which one allows simpler formal reasoning, simpler implementation? Three conclusions for the Abstract Collections: 1. Since they are currently build on Constructor Classes, their current design can be expressed with left-single FunDeps only, I don't see any application for left-multiple FunDeps. (ByMaps have to be examined yet!) 2. Structures that only work on a certain element-type (e.g. Patricia- Trees) must nevertheless have a type- parameter. This may be counterintuitive and someone here claimed that it leads to that late detection of type errors, but this approach does at least appear very simple in my intuition: every Collection type is made up of an unary type constructor and an element type. The fact, that certain concrete data structures only work on certain element types can conveniently be expressed in the instance declaration:
instace Collection Patricia Int where ...
Furthermore, we use newtypes anyway to protect the invariants of our structure implementations, so this is not a problem, either. 3. If we should nevertheless decide some day, that FunDeps are The Better Thing, we know that the current design of the Collections has a straight-forward translation to FunDeps. If a transition has to be made some day, this can happen semi-automatically without the need to rethink or redesign things. Neither for implementors of data structures, nor for their users.