Proposal for a Standard of Abstract Collections (with Reference Implementation)

hi all, Curiously I have finished the promised draft standard for the Abstract Collections just by a time, where many are busily working on DData. It's there: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/ I'm making this discussion hot now, since my Dessy library implements the proposed collection standard and features all of the concrete structures that are also in DData, and some more, especially Deques and Democratic Sequences (explanation: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/democratic/ ). But I also want to have your attention for another aspect of the problem: the introduction of abstract data structures makes the programming language much more useful and suggests some generalisations of the manifest syntax. Thus my proposal does not only contain - a formal specification for a collection hierarchy (based on Abstract Algebraic Data Structures (TM)), but also - generalisations of all the standard functions from Prelude and Data.List, and - a formal specification of Haskell extensions, that make abstract structures as easy to use as lists -- but with much more flexibility. Incidentally the current discussion around DData already shows signs of what I predict in the standard's rationale: when concrete data structures are extended to become more general, the number of functions increases so fast, that a general standard isn't more complicated any more, but simpler actually. For example, you discussed whether Sets and Maps should be converted from and to Sequences or the built-in "lists". Well, the best answer is to convert from and two an type classes of which "lists" (if one uses them at all) are an instance. Exactly this is done by the function "convert" proposed in the standard:
convert :: (Collection coll a, Collection coll' a) => coll a -> coll' a convert = fold empty single (<+>)
(The function can be specialised in the same way as 'fromIntegral' on the numeric types, so we don't loose efficiency.) Abstraction is more: the proposed standard helps to use a lot of different data structure implementations together, using one interface that has been designed with the user first in mind, not the implementation. But this abstraction also allows an entirely new programming style, that is especially useful when TEACHING programming. Data Structures are deeply intertwined with programming as such, that the proposed standard gives Haskell something that NO other language does have today. (Perhaps Eiffel comes closest, but EiffelBase is not standardised.) (Remind me to write "A Gentle Introduction into Haskell with Abstract Collections".) It has been mentioned that an advantage is of functional programming is the ability to easily _compose_ programs. But for this composition data structures are neccessary, without a standard for data structures functional languages loose one of their biggest advantages. In specifications Sets are the most important data structure, but in Haskell today, the so-called "lists" are the linguaga franca and that reminds me too much of the stacks in Forth (remember "lists" are actually functional stacks) or dynamic arrays in imperative languages: it's universal, but unflexible. (I'll have to write a paper on "Liberating programming from the McCarthy-style".) Since "a standard of Abstract Collections" has already been mentioned by some, I expect that there are high expectations regarding my proposal, and perhaps its many disadvantages (all the big ones mentioned in the documentation) will discourage the reader. On the other hand, there are so many advantages that one can hardly neglect that something big is going on. So the least thing that someone interested in the future of (functional) programming should is to download and try out the Collections, or to read a little bit in the design documents. Things to do: - Build examples for the use of the new Collections. (From now on anyone can port his code to get rid of "lists".) - Develop the Haskell Documentation Standard and apply it to Collections. (Specifications are exported and used for Tests.) - Using those specifications make a complete test framework. (Complete meaning every function is tested with the complete specification, so no Bugs can persist.) Given all that, it is trivial to add more implementations of the Abstract Collections. (Just added two concrete structures Yesterday, although Dessy has already much more functionality than DData, but only half its code size. (Or one third if you don't count the type classes and specifications.) I also think that Dessy has more readable code, but I don't know how important that is for anyone.) Have a good time, Bob

Robert Will wrote:
I've already preferred DData over Edison http://www.haskell.org/ghc/docs/edison/ also because DData is Haskell 98 compliant (without glasgow extensions), thus "portable" (in terms of the haddock documentation of the library). Viewing maps as collections of pairs indeed suggests to change some names (like member and filter) and some types, i.e. using a pair as input instead of curried arguments. (i.e. "member" could become "isDefined" when maps are viewed as finite functions.) I would not mind, if a collection class/interface is provided on top of the given DData modules/types (that one may or may not use) using glasgow extensions. And I would also not mind if there are other bag, set and map implementations (i.e. based on common tree implementation), so that a simple change of the import gives you a different performance behaviour (i.e. modules {Bag, Set, Map}{ByOrderedSeq,ByBalancedTree}.hs) However the (non-portable) instances for a big collection class should be in separate modules. Christian

Christian Maeder wrote:
I would not mind, if a collection class/interface is provided on top of the given DData modules/types (that one may or may not use) using glasgow extensions.
I would even be glad to have such a class, only to ensure consistent namings.

On Fri, 19 Mar 2004, Christian Maeder wrote:
And I would also not mind if there are other bag, set and map implementations (i.e. based on common tree implementation), so that a simple change of the import gives you a different performance behaviour (i.e. modules {Bag, Set, Map}{ByOrderedSeq,ByBalancedTree}.hs)
Hehe, since any Balanced Tree is a canonical instance of Sequence, ByOrderedSeq and ByBalancedTree is in fact the same. Due to the abstract balanced trees used in Dessy, there are already as much implementations of Bag, Set, and Map as there are Balanced Trees (about five at the moment). Additionally there are Patricia-Trees which are Ordered and Balanced by the same mechanism. Robert

G'day all.
Quoting Robert Will
- a formal specification for a collection hierarchy (based on Abstract Algebraic Data Structures (TM)), but also
I'm not happy with the "kind" of a collection. I personally think that the type of the object in the collection is better connected with the type of the collection with a functional dependency. Let me explain. The proposed class interface starts like this: class (Eq (coll a), Eq a) => Collection coll a where empty :: coll a single :: a -> coll a {- etc -} Here's how it would look with fundeps: class (Eq coll, Eq a) => Collection coll a | coll -> a where empty :: coll single :: a -> coll {- etc -} Consider, for example, Patricia tries, which can only store integer keys. Without fundeps, you need to use some kind of phantom type: data PatriciaTrie a {- a must be Int -} = {- whatever -} instance Collection PatriciaTrie Int where {- etc -} The comment gives a constraint that the compiler can't check. With fundeps, the constraint is checked: data PatriciaTrie = {- whatever -} instance Collection PatriciaTrie Int where {- etc -} Another example is radix structures, which can only take keys which are themselves sequences (for the purposes of this example, lists). Here's how you could do it with fundeps: data RadixCollection a = {- whatever -} instance Collection (RadixCollection a) [a] where {- etc -} Consider how you'd do this without fundeps. Another question that you might want to consider is how collections like Data.HashTable, which is embedded in a monad, might fit into the scheme. Cheers, Andrew Bromage

It looks interesting and I'm still looking at it, although I think many of the language extensions need to be better thought out. But it exhibits the "creeping Eq" problem: your hierarchy starts class (Eq (coll a), Eq a) => Collection coll a where ... If this is to replace lists, this is unacceptable: I can't have lists of (say) functions? Collection classes should not require Eq instances on the members, except when necessary! Peace, Dylan

Another comment is that it looks too complicated. Your basic Collection class has 30 members, and some of them are clearly excessive: do you really need all of has, elem, (#), not_elem, and (/#) in the class (rather than defined as auxiliary functions, possibly optimised with fusion)? (Of course, these particular functions should be in a subclass that has equality on the arguments...) For another example, how could zip possibly be given a more efficient implementation than the one you provide? (And furthermore, a function 'zip' obviously doesn't make sense on general collections. Even for sequences, it can't be "democratic", so probably doesn't belong in the class.) Please try to simplify the interface more! Peace, Dylan

hi all, On Sun, 21 Mar 2004 ajb@spamcop.net wrote:
I'm not happy with the "kind" of a collection. I personally think that the type of the object in the collection is better connected with the type of the collection with a functional dependency.
Yeah, that may be a good thing to consider, but for now I think it's simpler without. FunDeps are a language extension, and I also think they're rather complicated. On Mon, 22 Mar 2004, Dylan Thurston wrote:
Collection classes should not require Eq instances on the members, except when necessary!
Problem known, solution designed, not yet implemented: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/principles.html#not_here On Mon, 22 Mar 2004, Dylan Thurston wrote:
do you really need all of has, elem, (#), not_elem, and (/#) in the class (rather than defined as auxiliary functions, possibly optimised with fusion)?
You're right, I'll do that at once.
For another example, how could zip possibly be given a more efficient implementation than the one you provide?
The one I provide is O(N*log N) on balanced structures (N times 'but_first'), and the new default implementation is O(N) (but not on all collections, e.g. not on Queues).
'zip' obviously doesn't make sense on general collections.
Well, you can 'zip' Sets... but as for so many things we can as well put it into the Sequence class and require an explicit conversion. (Which we'll hope is eliminated by the compiler.)
Please try to simplify the interface more!
I already made efforts! Anyway a user doesn't care, since the functions have the same interface, whether they are in the class, or not. And the implementor doesn't care, if there are good default implementations. Robert

Robert Will
On Sun, 21 Mar 2004 ajb@spamcop.net wrote:
I'm not happy with the "kind" of a collection. I personally think that the type of the object in the collection is better connected with the type of the collection with a functional dependency.
Yeah, that may be a good thing to consider, but for now I think it's simpler without. FunDeps are a language extension, and I also think they're rather complicated.
Multi-parameter type classes, as used in your proposal, are also a language extension. Every implementation that supports MPTC also supports fundeps. Regards, Malcolm

On Tue, 23 Mar 2004, Malcolm Wallace wrote:
Multi-parameter type classes, as used in your proposal, are also a language extension. Every implementation that supports MPTC also supports fundeps.
Hair-splitting reply: MPTC are not an extension, but the suppression of a restriction. Not being restricted to one TC parameter is also a simplification in some sense... FunDeps, otoh, are a completely new concept. Honest reply: I don't fully understand FunDeps and I don't want to learn (now), since the Collections should also be understandable for people less intelligent than me. (This is a design criterion used throughout, and I think my not-too-high intelligence has made this one of the Collection's success factors!) Well, that clarifies also some other issues. Nevertheless I didn't dare to write that in the "official rationale". Perhaps I should. Oh, and I also think of people who are more intelligent than me, but who have better things to do than learning a lot of new language concepts, when all they want is to solve their application problem. (This is an official design criterion.) Robert

Am Freitag, 26. März 2004 09:19 schrieb Robert Will:
On Tue, 23 Mar 2004, Malcolm Wallace wrote:
Multi-parameter type classes, as used in your proposal, are also a language extension. Every implementation that supports MPTC also supports fundeps.
Hair-splitting reply: MPTC are not an extension, but the suppression of a restriction.
Then every so-called language extension is the suppression of a restriction since it allows Haskell programs which were incorrect before. ;-)
Not being restricted to one TC parameter is also a simplification in some sense... FunDeps, otoh, are a completely new concept.
But they are often needed or at least useful when MPTCs are used.
Honest reply: I don't fully understand FunDeps and I don't want to learn (now), since the Collections should also be understandable for people less intelligent than me. (This is a design criterion used throughout, and I think my not-too-high intelligence has made this one of the Collection's success factors!)
Fundeps basics are not so difficult to understand IMHO, at least if you have some database background. A one-parameter type class defines a predicate on the types, a multi-parameter type class defines a relation, and a MPTC with fundeps defines a relation with functional dependencies just like in relational database technology.
[...]
Robert
Wolfgang

On Fri, Mar 26, 2004 at 09:19:36AM +0100, Robert Will wrote:
Honest reply: I don't fully understand FunDeps and I don't want to learn (now), since the Collections should also be understandable for people less intelligent than me. (This is a design criterion used throughout, and I think my not-too-high intelligence has made this one of the Collection's success factors!)
Really, there not so complicated: you already make exactly the same distinction in your Map and Relation classes! Also, they're vitally necessary for certain algorithms and for speed (since not all ways of making containers can contain all items). Peace, Dylan

hi, On Fri, 26 Mar 2004, Robert Will wrote:
Honest reply: I don't fully understand FunDeps and I don't want to learn
Well, in fact I couldn't resist the temptation to look it at, only to find that they are really simpler than my earlier information suggested. I also made two small observations (a) that non-circular FunDeps with a single variable on the left side, can be one-to-one emulated via Constructor classes, and (b) parametric type classes correspond exactly to FunDeps with only one constraint. (Both of the special cases occur rather often.) Anyway it consolidates my opinion regarding the Abstract Collections and FunDeps: no need to make a redesign. (Full text at the bottom.) The only thing that bothers me at the moment, is that I couldn't find the "many other applications of FunDeps" that are mentioned but not given in Jones' paper. Perhaps one should make a list. On Fri, 26 Mar 2004, Dylan Thurston wrote:
Also, they're vitally necessary for certain algorithms and for speed (since not all ways of making containers can contain all items).
If you mean restricting some Containers to certain element types (Patricia trees and so on), this is perfectly possible without FunDeps. If you don't mean that, I don't understand it (even with my new knowledge of FunDeps ;-). Robert -------------------- 250 lines In section 0, I explain FunDeps for people like me. Section 1 shows that a certain kind of FunDeps can always be expressed with Constructor classes. Section 2 compares those approaches and Section 3 makes the relationship between FundDeps and Parametric Type Classes more precise. (Although I don't consider this relevant in practice.) Finally I draw some conclusions for the Abstract Collections and further research. I'm referring to FunDeps as described in Mark P. Jones: "Type Classes with Functional Dependencies" Proc. ESOP 2000, Berlin, Springer-Verlag LNCS 1782. People wondering about why I need to formulate things different to consider them correct, may read about mental universes in Stefan Kahrs, 1999: A formalist's perspective on mathematics http://www.cs.ukc.ac.uk/people/staff/smk/manifesto.ps 0. FunDeps are restrictions on the set of possible instance declarations. We consider Jones' way to define a class for collection types: class Collects e ce where -- empty :: ce -- ambiguous single :: e -> ce -- (<+>) :: ce -> ce -> ce -- ambiguous add :: ce -> e -> ce first :: ce -> e And two examples: a) Intuitively the following binding should be ill-typed, but instead it gets a type:
g :: (Collects String ce, Collects Int ce) => ce g = (single "Jones") `add` 1
b) And the following should have type "Int" instead:
f :: (Collects Int ce, Collects e ce) => e f = first (single 1)
Indeed, if we had some multi-collection type "MultiColl" with
instance Collects String MultiColl where ... instance Collects Int MultiColl where ...
Then the above types could be specialised to g :: MultiColl f :: String Given that MultiColl is easily implementable in Haskell (as a tuple of collections of String, Int, and possibly others; or as a Collections of a sum type...), we must admit that the above polymorphic types are indeed correct. But what has made us assume the contrary? Well, obviously we thought that there would never be any type that could have both instances of Collects. More generally, we would assume that any type 't' will only have one single instance of Collects: one single element type, for each collection type. If the compiler would know that, too, he could simplify the above type expressions to yield an error for 'g' (since no 'ce' can comoply to both constraints) and "Int" for 'f' (since 'b' and 'Int' as element types of the same collection type must be the same). The relation to relational data bases: A data base query like "give me the name and address of the patient with social security number (SSN) such_and_such," is ambigous in the general case: there might be more than one patient with SSN such_and_such. The solution is to declare a functional dependency: "SSN -> (name, address)" which is just a declaration that for each SSN there is at most one 'name' and 'address' in all entries where that SSN appears. (There might be different such entries, e.g. if the SSN is connected with multiple insurance cases.) In our problem, we can do just the same: In a class declaration "class C t" (where 't' are the type variables of the class) a Functional Dependency is written "a -> b" (where 'a' and 'b' are subsets of 't'). It's semantics (modulo polymorphic instances) is: A FunDep "a -> b" restricts the set of legal instance declarations such that for every pair of instances where the variables 'a' are replaced by the same types, 'b' must also be the same types. As a consequence, if 'a' and 'b' are the only type variables of a class, this means that there can be at most one instance declaration for any substitution for 'a'. Since instances can also contain type variables (they can be polymorphic), the characterisation "same substitution of types" is not general enough. But we can "lift" the above definition if we consider any polymorphic instance declaration to represent the (possibly infinite) set of all its non-polymorphic substitution instances. Then, for example the declaration
instance Collects x [y]
Stands in place for "instance Collects Int [Int]", "instance Collects Int [Bool]" and so on. This clearly violates the restriction. In section 6.1. of his paper Jones formulates the rectriction using unification, so that polymorphic instances are covered, and in section 6.2. he derives simplification rules for the constraints, which makes disappear ambiguous variables (because they are unified with others). 1. Left-Single FunDeps are Syntactic Sugar for Constructor Classes Haskell distinguishes between types (like "Int", "a -> b") and type constructors (like [], (->)). Unlike in the value-world, where functions and data constructors represent "normal" values, type constructors are not normal types: they can be used as parameters to other type constructors, but in a legal type, all constructors must be completely applied. AFAICS, in some earlier version of Haskell the parameter(s) of a type class could only represent a type, no partially- or unapplied type constructors. Type classes that work with the latter later came as an extension and are called "Constructor Classes". For simple FunDeps of the form "a -> b" where 'a' and 'b' are single variables, not set of variables, we can do a simple transformation: in the declaration "class C a b ... where" we replace 'a' with 't' (the variable stands for an unary type constructor) and in the class context (the part before the "=>") and the body (the declaration of the member functions) we replace each 'a' with 't b'. Then we can leave out the FunDep and still have exactly the same semantics. (Note that having non-variable constructors in the class context is a language extension, even a dangerous one, but this is not required if 'a' doesn't appear in the context, or if we can change the superclass in the same way.) This transformation can be done to eliminate all FunDeps that (1) have only one variable on the left hand side (but possibly more on the right hand side), and that (2) are not cyclic, i.e., there's no dependency chain like a -> b, b -> c, c -> .. a. If we think of constructor classes as the "more basic" construct (which is reasonable, since their semantic rules are not much different from ordinary type classes), we can thus consider left-single non-cyclic FunDeps as syntactic sugar for the use of constructor classes. 2. Comparison Here is a small example that compares the FunDep and the Constructor Class version of a type:
From the Abstract Collections: (read (<:) as 'add_first', formerly known as 'cons') (1<:) :: ( Sequence seq Int ) => seq Int -> seq Int
using FunDeps this would be:
(1<:) :: ( Sequence seq Int ) => seq -> seq
As you can see the first version is more informative, saying explicitly "seq Int". Of course, in the FunDep version we could rename 'seq' to 'seq_Int' to make it more readable. But still the constructor version provides us automatically with that structuring: every collection type consists of a collection constructor and an element type, just like if we would use concrete types: "seq Int" vs "Seq Int" (or "(WeightBalanced LeafTree) Int", which would be "WeightBalanced (LeafTree Int)", with nullary collection types, by the way -- AFAICS without having tried, Dessy's building set approach would also work with nullary Collection classes with just this change in parentheses). The following example reveals a key point ('apply' being the function formerly known as 'map'):
apply :: ( Collection coll a, Collection coll b ) => (a -> b) -> coll a -> coll b
with FunDeps:
apply :: ( Collects a ca, Collection b cb ) => (a -> b) -> ca -> cb
The first type expresses that the type constructor of 'apply''s argument and result must be the same, while the nullary version can't express that. (This is not tooooo important in the abstract collections, perhaps even a bit questionable, but (a) it helps us to avoid intermediate ambigous types, (even with FundDeps 'cb' wouldn't be unified with anything more concrete) and (b) it also helps make simpler implementations, since 'apply' can be lifted to use the representation of higher-order type constructors, such as 'WeightBalanced'.) Short summary: Constructor classes give slightly better documentation and are slightly more expressive. 3. Parametric Type Classes are Normalised FunDeps Here is another small observation: in relational data bases we want to transform our tables in such a way, that are as little as possible dependencies (since they incur redundancy). Thus we bring them into a normal form, with only one functional constraint of the form "a -> b" (here 'a' and 'b' denote sets (of columns) again). The 'a' part is called the "key" of an entry in the data base. Classes with only one FunDep are necessarily in normal form. I don't want to consider transforming class declarations to normal form (no idea what that would mean...), just this: if we use a parametric type class of the form "class a \elem C b where" ('a' and 'b' sets of variables) then this corresponds just to a FunDep in normal form. Non-normal-form FunDeps can't be expressed with Parametric Type Classes. 4. Conclusions Two conclusions for further research: 1. Everytime one advocates FunDeps, one should mention whether the application needs left-multiple FunDeps, or whether left-single FunDeps suffice. 2. Everytime one advocates left-single FunDeps, one should compare the solution with Constructor Classes and one should consider all the different possibilities to encode the FunDeps with constructor classes. Which one is the most simple, most intuitive? Which one allows simpler formal reasoning, simpler implementation? Three conclusions for the Abstract Collections: 1. Since they are currently build on Constructor Classes, their current design can be expressed with left-single FunDeps only, I don't see any application for left-multiple FunDeps. (ByMaps have to be examined yet!) 2. Structures that only work on a certain element-type (e.g. Patricia- Trees) must nevertheless have a type- parameter. This may be counterintuitive and someone here claimed that it leads to that late detection of type errors, but this approach does at least appear very simple in my intuition: every Collection type is made up of an unary type constructor and an element type. The fact, that certain concrete data structures only work on certain element types can conveniently be expressed in the instance declaration:
instace Collection Patricia Int where ...
Furthermore, we use newtypes anyway to protect the invariants of our structure implementations, so this is not a problem, either. 3. If we should nevertheless decide some day, that FunDeps are The Better Thing, we know that the current design of the Collections has a straight-forward translation to FunDeps. If a transition has to be made some day, this can happen semi-automatically without the need to rethink or redesign things. Neither for implementors of data structures, nor for their users.

hi, Robert Will wrote:
hi,
On Fri, 26 Mar 2004, Robert Will wrote:
Honest reply: I don't fully understand FunDeps and I don't want to learn
Well, in fact I couldn't resist the temptation to look it at, only to find that they are really simpler than my earlier information suggested. I also made two small observations (a) that non-circular FunDeps with a single variable on the left side, can be one-to-one emulated via Constructor classes, and (b) parametric type classes correspond exactly to FunDeps with only one constraint. (Both of the special cases occur rather often.) Anyway it consolidates my opinion regarding the Abstract Collections and FunDeps: no need to make a redesign. (Full text at the bottom.)
The only thing that bothers me at the moment, is that I couldn't find the "many other applications of FunDeps" that are mentioned but not given in Jones' paper. Perhaps one should make a list.
functional dependencies are very convenient when you work with muti parameter classes. if you find mutiparameter classes useful is different question, but most people will agree that at least sometimes they are quite useful. to see how they are useful, try and design a nontrivial class hirarchy using mutiparameter classes without functional dependencies. don't forget to try and use it. before i learned about functional dependencies i constantly fell into that trap --- i would create a cool hirarchy of classes, and then as soon as i tried to use them, i would get silly ambiguities, and unresolved overloadings. bye iavor

On Fri, Apr 02, 2004 at 11:21:05AM +0200, Robert Will wrote:
On Fri, 26 Mar 2004, Dylan Thurston wrote:
Also, they're vitally necessary for certain algorithms and for speed (since not all ways of making containers can contain all items).
If you mean restricting some Containers to certain element types (Patricia trees and so on), this is perfectly possible without FunDeps. ...
I did mean that, and thanks for explaining how I was being silly! I am now more convinced than before that the 'Eq' and 'Ord' constraints on your classes don't belong; if I want to implement the ridiculously slow list-based implementation of Sets currently in the Prelude, which only needs equality and nothing else, why can't I? Peace, Dylan

On Fri, 2 Apr 2004, Dylan Thurston wrote:
I am now more convinced than before that the 'Eq' and 'Ord' constraints on your classes don't belong;
The 'Eq' constraint is gone now. And I think the 'Ord' is no problem, since Sets/Maps need 'Eq' anyway.
if I want to implement the ridiculously slow list-based implementation of Sets currently in the Prelude, which only needs equality and nothing else, why can't I?
You could, if I the Abstract Collections had a class of Unordered Sets, but I left that one out, because it is of little use. (Really long rationale is in the documentation.) Robert

On Fri, Apr 02, 2004 at 11:21:05AM +0200, Robert Will wrote:
1. Left-Single FunDeps are Syntactic Sugar for Constructor Classes
I think you should try writing programs a little before making such broad claims... I was convinced when I read this, but then someone in haskell-cafe (http://www.haskell.org//pipermail/haskell-cafe/2004-April/006014.html) tried actually doing it, and ran into some trouble; see the second half of the message. I don't know how to make the "Constructor Class" version of the program work without '-fallow-undecidable-instances'. Do you? Also:
3. Parametric Type Classes are Normalised FunDeps
... Classes with only one FunDep are necessarily in normal form. I don't want to consider transforming class declarations to normal form (no idea what that would mean...), just this: if we use a parametric type class of the form "class a \elem C b where" ('a' and 'b' sets of variables) then this corresponds just to a FunDep in normal form. Non-normal-form FunDeps can't be expressed with Parametric Type Classes.
I have no idea what you mean here. "class a \elem C b where" doesn't look like Haskell to me. What is a parametric type class? Could you expand a little? (And maybe we should move this discussion to haskell-cafe, since it's very off-topic for the libraries list and other people might be interested. Maybe you could repost the parent there as an introduction?) Peace, Dylan

W liście z pią, 02-04-2004, godz. 11:21 +0200, Robert Will napisał:
In section 0, I explain FunDeps for people like me. Section 1 shows that a certain kind of FunDeps can always be expressed with Constructor classes.
It forces types with parameters where they might not be desirable. For example if we have "Collection c e" class instead of "Collectio ce e", we can't make the PackedString type its instance. We must give it a dummy parameter, always instantiated to Char. I once tried to design collection classes and failed. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

On Sat, 10 Apr 2004, Marcin 'Qrczak' Kowalczyk wrote:
For example if we have "Collection c e" class instead of "Collectio ce e", we can't make the PackedString type its instance. We must give it a dummy parameter, always instantiated to Char.
So we can, though it's not nice.
I once tried to design collection classes and failed.
To succeed we have to accept, that not all things will be nice, but we can still get a lot of advantages out of it. I hope my proposal can still be made nicer, but at the moment I don't see any way to do that without taking away many of its advantages. Robert

On Fri, 19 Mar 2004 08:56:36 +0100, Robert Will
Curiously I have finished the promised draft standard for the Abstract Collections just by a time, where many are busily working on DData.
At a first impression, Dessy seems very wide-ranged framework for working with abstract collections. I am impressed by the amount of work you have put in it, and also a bit intimidated by its size :-) It would help me if you could use Haddock to show the interface in html format (documentation is not necessary for now, just the type signatures would be great). Could you maybe also tell something about the relation with Edison? This framework seemed to have the same goals as Dessy? Why is it not suitable for your purposes?
I'm making this discussion hot now, since my Dessy library implements the proposed collection standard and features all of the concrete structures that are also in DData, and some more,
Please be aware that the goal of DData is just to provide a range of generally useful *concrete* data structures. Currently, JP Bernardy is doing an excellent job of publicly discussing the design and naming schemes to make DData part of the hierarchical libraries. We are *not* trying to create a libraries for abstract collections like Dessy and Edison. It is important though at this stage that DData can be used for providing concrete implementations for an abstract collection library like Dessy. Are there any important design issues/restrictions that have to be fulfilled to make DData implement some of the collections? (like needing Eq on the elements?) All the best, Daan. Old? Edison: http://sourceforge.net/projects/hfl I think that Andrew Bromage is maintaining this?

On Tue, 23 Mar 2004, Daan Leijen wrote:
It would help me if you could use Haddock to show the interface in html format (documentation is not necessary for now, just the type signatures would be great).
I can only do that later, so if noone wants to jump in, you have to wait some weaks. (Alternatively, use 'grep ::' on the code.)
Could you maybe also tell something about the relation with Edison? This framework seemed to have the same goals as Dessy? Why is it not suitable for your purposes?
Since this comparison is rather spread out in my design documents, here is a summary: First of all, my Collections are a more model-centered while Edison is more algorithm-centered. Okasaki's (good!) choice to have all concrete structures implement all operations of a class, does already make this difference small, but still Edison centers around algorithms (and data structures for use in algorithms), while I center on specifications and data structures for use in day-to-day non-algorithmic programming. A discussion of this is found in the material for (imperative) Dessy: http://www.stud.tu-ilmenau.de/~robertw/dessy Next, I'm using a brand-new algebraic approach with inheritance as refinement to specify the collections and to build up the hierarchy. Thereby many things become simpler (and more different from the List-centered traditional functions). This is explained at length in http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/principles.htm As a consequence of that approach, functions that work on all of the Collections get a clear (and useful) semantics. Programmer's won't need to learn Sequence.fold, Map.fold, Set.fold, and so on -- they just learn to fold (along the right lines). This is what makes the approach challenge the traditional "list" data type and with it large parts of current programming (formal derivation) methodology. Lastly, the differences in more concrete design decisions, most notably: - I have no unordered structures. - I always view (==) as equality. http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/principles.htm#not_here
Please be aware that the goal of DData is just to provide a range of generally useful *concrete* data structures. Currently, JP Bernardy is doing an excellent job of publicly discussing the design and naming schemes to make DData part of the hierarchical libraries.
Yes, of course, DData is a good (even indespensable) thing. However, the long-term goal of the abstract collections is to make all such autonomous libraries obsolete, so there's a natural conflict. We have to manage a smooth transition. I think some of JP's changes to DData are already very good steps in that transition, notably the view of (==) as equality and the consequent removal of bias. This is another confirmation of my prejudice that people in the FP community are much more ready to accept change for The Right Thing than "imperative folks". That's one of the reasons that Functional Dessy got out so fast, while the imperative version is older and has still not seen the light of the day. (Another reason is of course, that FP is simpler, we have only algebraic Collections, no mutable ones. Imperative Dessy will be the first library that has both...)
Are there any important design issues/restrictions that have to be fulfilled to make DData implement some of the collections?
Simple answer (from my advertising dept.): "To add a new data structure one just needs to implement two (in numbers: <b>2</b>) functions and the structure will inherit (about) 30 functions and hundreds of other polymorhic functions can be used with it." But of course you don't just want to implement 'split' and 'unsplit' since then most of DData's (optimised) code will not be used. So there are some real compromises to make (hard thinking required). And since Dessy already provides all of DData's algorithms (re-implemented, sharing much code with other algorithms) with a Collections-interface, I can't see a benefit of such a port at time being. I think at the moment the focus should be on collecting more experience with the Collections, setting up Design by Contract and a (performance and correctness) test framework. After that we can think about implementations. However, we can already see that the "markets" of DData and Dessy are clearly distinct: one optimised for performance (at the expense of code duplication) and the other as a tool-box for algorithms (with code that longs to be ready by others, e.g. students). The synergetic effect will not only be provided by a common interface, but also by exchange of algorithmic ideas and "practical issues of optimisation". Brave new world. :-) Robert

Hi Robert,
On Fri, 26 Mar 2004 10:31:57 +0100, Robert Will
On Tue, 23 Mar 2004, Daan Leijen wrote:
It would help me if you could use Haddock to show the interface in html format (documentation is not necessary for now, just the type signatures would be great).
I can only do that later, so if noone wants to jump in, you have to wait some weaks. (Alternatively, use 'grep ::' on the code.)
Ha, never thought about "grep", I'll use that for now.
Could you maybe also tell something about the relation with Edison? This framework seemed to have the same goals as Dessy? Why is it not suitable for your purposes?
Since this comparison is rather spread out in my design documents, here is a summary: [snip]
Thanks for the extensive clarification. I'll look into some more. -- Daan.
First of all, my Collections are a more model-centered while Edison is more algorithm-centered. Okasaki's (good!) choice to have all concrete structures implement all operations of a class, does already make this difference small, but still Edison centers around algorithms (and data structures for use in algorithms), while I center on specifications and data structures for use in day-to-day non-algorithmic programming. A discussion of this is found in the material for (imperative) Dessy: http://www.stud.tu-ilmenau.de/~robertw/dessy
Next, I'm using a brand-new algebraic approach with inheritance as refinement to specify the collections and to build up the hierarchy. Thereby many things become simpler (and more different from the List-centered traditional functions). This is explained at length in http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/principles.htm
As a consequence of that approach, functions that work on all of the Collections get a clear (and useful) semantics. Programmer's won't need to learn Sequence.fold, Map.fold, Set.fold, and so on -- they just learn to fold (along the right lines). This is what makes the approach challenge the traditional "list" data type and with it large parts of current programming (formal derivation) methodology.
Lastly, the differences in more concrete design decisions, most notably: - I have no unordered structures. - I always view (==) as equality. http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/principles.htm#not_here
Please be aware that the goal of DData is just to provide a range of generally useful *concrete* data structures. Currently, JP Bernardy is doing an excellent job of publicly discussing the design and naming schemes to make DData part of the hierarchical libraries.
Yes, of course, DData is a good (even indespensable) thing. However, the long-term goal of the abstract collections is to make all such autonomous libraries obsolete, so there's a natural conflict. We have to manage a smooth transition.
I think some of JP's changes to DData are already very good steps in that transition, notably the view of (==) as equality and the consequent removal of bias. This is another confirmation of my prejudice that people in the FP community are much more ready to accept change for The Right Thing than "imperative folks". That's one of the reasons that Functional Dessy got out so fast, while the imperative version is older and has still not seen the light of the day. (Another reason is of course, that FP is simpler, we have only algebraic Collections, no mutable ones. Imperative Dessy will be the first library that has both...)
Are there any important design issues/restrictions that have to be fulfilled to make DData implement some of the collections?
Simple answer (from my advertising dept.): "To add a new data structure one just needs to implement two (in numbers: <b>2</b>) functions and the structure will inherit (about) 30 functions and hundreds of other polymorhic functions can be used with it."
But of course you don't just want to implement 'split' and 'unsplit' since then most of DData's (optimised) code will not be used. So there are some real compromises to make (hard thinking required). And since Dessy already provides all of DData's algorithms (re-implemented, sharing much code with other algorithms) with a Collections-interface, I can't see a benefit of such a port at time being.
I think at the moment the focus should be on collecting more experience with the Collections, setting up Design by Contract and a (performance and correctness) test framework. After that we can think about implementations. However, we can already see that the "markets" of DData and Dessy are clearly distinct: one optimised for performance (at the expense of code duplication) and the other as a tool-box for algorithms (with code that longs to be ready by others, e.g. students). The synergetic effect will not only be provided by a common interface, but also by exchange of algorithmic ideas and "practical issues of optimisation". Brave new world. :-)
Robert _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (9)
-
ajb@spamcop.net
-
Christian Maeder
-
Daan Leijen
-
dpt@lotus.bostoncoop.net
-
Iavor S. Diatchki
-
Malcolm Wallace
-
Marcin 'Qrczak' Kowalczyk
-
Robert Will
-
Wolfgang Jeltsch