
Apologies, I sent this to the wrong list. It should go to Libraries... Robert ---------- Forwarded message ---------- hi Ross and others, On Wed, 7 Apr 2004, Robert Will wrote:
On Mon, 5 Apr 2004 ross@soi.city.ac.uk wrote:
To have something concrete to discuss, I've placed a structure based on Edison at docs: http://www.soi.city.ac.uk/~ross/seq/ code: http://www.soi.city.ac.uk/~ross/seq.tar.gz
Really, since you have read the comparison of my Abstract Collections with Edison, you should have said one word, why you choose Edison as a basis and not mine. Well, I gues the reason is just that you don't want MPTC and that you're perhaps afraid of all my new terminology while Edisons names sound so familiar. So here is a short argument in favour of my choices: 1. The class "OperationalSequence" in the Abstract Collections is an MPTC that makes no restrictions on the element type. I dubbed it "Operational", because without an equality over Sequences we can't give executable algebraic specifications. It uses MPTC for a debatable reason: to be part of the framework of Abstract Collections. (The source file explains why we can't mix SPTC and MPTC code.) But is that really so bad? MPTC work on all major compilers and no-one doubts they will be in a future standard. Furthermore, since it uses Constructor Classes (just like Edison) there are no ambiguity problems that would involve FunDeps! 2. Also the names are choosen not to be compatible with the past, but to be combatible with the future: the names of the Sequence operations fit well with those of all other Collections. Many names are shared and have the same semantics on all Collections.
PS: Dessy also started with (list-compatible) Sequences and than added the rest. We should certainly not make that detour on the large scale.
with "list-compatible" I mean starting with class members (cons head tail foldr foldl...) and then add many more. Really one should be looking for the common base of all sequence implementations and what's more all sequence applications. The absurdness of the "list-compatible" approach shows in the ridiculous default running times for 'cons' O(1) and 'snoc' O(n), really what's the big difference between those operations? The names alone are a nightmare: (cons snoc init tail head last left right) (as in lview, rview) and many functions that only work from the front. How many different names should we have for the front (with the first element) and the back (with the last element)? My proposal shows that those four terms (first last front back) suffice to name all useful functions on lists. Surely the naming approach taken for my Abstract Collections is no fundamental breaking with conventions: all the names and operators are either tradition in Haskell or tradition in Scheme or some consistent extension of this (e.g. (<:) and (>:) for 'add_first' and 'add_last'). Clearly my new names are easier to learn for novices, but I also claim that learning the whole library (i.e. the "compatible" functions and the new ones) is easier for the Abstract Collections than for Edison. Just because the names are more consistent, and still many names are like in standard Haskell. So I'd suggest, that a standard Sequence class should be based on the Abstract Collections. (And by the way, why not also think about Sets and Maps, you can use them now: "import Prelude (); import Collections" is not so hard to write!) By the way I think that the documentation for the Sequence instances in Ross' proposal is too redundant. The only thing we should need is a specification of the performance and of additional operations. I would propose the following time bounds: Defaults for Sequences: O(logN) for all operations, except O(1) for "atomic queries" (is_empty, first, last, size (or length)) This is the simplest to remember and a "democratic" implementation will just have those bounds. (And balanced tree does, Dessy has five of them.) Next thing needed are Deques: O(1) for (empty single <: >: first last but_first but_last is_empty) O(N) for the rest (JoinLists are like that with additional O(1) (++), so they're perhaps better called 'AppendableDeques'. Likewise CatenableLists are really AppendableStacks.) Write-only Sequences: (OrdList) O(1) for (empty single <: >: ++) O(N) for the rest One-side flexible Arrays or Random Access lists: are another very "algorithmic" structure, which support efficient 'update', a function that my Sequences don't even specify. This reminds me, that I haven't really seen any applications of Sequence data structures and that I decided anyway to get some examples before fixing the details of the class. By the way, there are already some examples of Maps and Sets at work: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/examples/ The only examples for Sequences is Okasakis Tree Enumeration using Deques and that doesn't look nice without views :-( Robert _______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell

Ross Paterson wrote:
To have something concrete to discuss, I've placed a structure based on Edison at docs: http://www.soi.city.ac.uk/~ross/seq/ code: http://www.soi.city.ac.uk/~ross/seq.tar.gz
Looks nice. It's nice and simple, and I could start using it quickly, without breaking too much of my code. Some naming nitpicks: * reduce1 and reducel are nice names, but they look almost indistinguishable in some fonts. Luckily, I don't use such a font for programming, but my browser does for displaying the docs. * indexM: I feel that 'M' stands for 'Monad', not 'Maybe'. What about 'Mb' instead? The main difference to Robert's approach, aside from his anti-list propaganda ;-), seems to be that it uses a Haskell98-compatible single-parameter type class. This means that concrete Sequence instances can't add an Ord context later - Are we sure that there are no useful implementations of the Sequence class that require their members to be instances of Ord? I definitely can't think of any, but perhaps someone else can? Also, we might really want to add an abstract collection framework later; that will have to use MPTCs, and we will probably want Sequence to be a subclass of that. Also, sooner or later we will move empty and isEmpty to the general Collection class, won't we? Are there many people who will want to use the new framework but stay with plain Haskell98? Robert Will wrote:
So here is a short argument in favour of my choices:
1. The class "OperationalSequence" in the Abstract Collections is an MPTC that makes no restrictions on the element type. I dubbed it "Operational", because without an equality over Sequences we can't give executable algebraic specifications.
Hmmm... so why should that fact be recorded in the name? Let's keep the names simple. I'd rather rename your 'Collection' class to something like CollectionEq, because
2. Also the names are choosen not to be compatible with the past, but to be combatible with the future: the names of the Sequence operations fit well with those of all other Collections. Many names are shared and have the same semantics on all Collections.
Who says the future _shouldn't_ be compatible with the past? This is one reason why I wouldn't want your libraries as they are now to be "the future"; With Ross's proposal, I can start using it in the few places in my programs where lists are not appropriate, without breaking much of the rest. To use Dessy, I'd either have to change a lot of my program, or I'd have to import it qualified :-(. Another problem I have with the names are those names_with_underscores. All the Haskell libraries, and all other Haskell code I've worked with, has always used namesWithoutUnderscores. If you want to replace part of the Prelude with something that used names_with_underscores, prepare for a religious war ;-).
The absurdness of the "list-compatible" approach shows in the ridiculous default running times for 'cons' O(1) and 'snoc' O(n), really what's the big difference between those operations?
readFile "foo" >>= print . sum . map (product . words) . lines Then lists are *exactly* what I want. Incidentally, code like this accounts for a large part of my use of lists. I often think of a list as a ForwardIterator from C++'s STL. I agree
Absurdness? Ridiculous? Your language is very strong. Or should I say It's ridiculously strong and you are absurdly sure of your own opinion ;-) ? (Sorry, couldn't resist to use the same language on you...). Let me say something in favour of lists: a) They have simple built-in syntax. Changing the [1,2,3] syntax to use a type class will make types even more problematic for beginners... Anyway, it won't happen soon. b) They have a very low constant factor. c) They are perfectly suited for many tasks. When I write something like this: that a singly-linked list is often a bad idea as a data structure for a sequence, but we use lists much more often that non-Haskell programmers use sequences. Most of our lists are just iterators or streams. About your Stream module: I don't see the point - I thought you already had an instance for Prelude.[] in your Collections module?
I would propose the following time bounds:
Defaults for Sequences: O(logN) for all operations, except O(1) for "atomic queries" (is_empty, first, last, size (or length))
Why are we arguing about "default running times"? It's not like they have any observable effect outside the documentation. Lists have been used a lot in Haskell, and they will continue to be used a lot. Lists are the most "normal" sequences in Haskell, so why shouldn't the running time of operations on other sequences be defined relative to lists? In fact, I'd say that I should get the default running times "for free" and different (in many cases, asymptotically better) running times if I am willing to pay a constant factor for that. Sorry for the long post, but I've followed the discussion long enough, and it's always tempting to add my 0.02 euros, so I finally couldn't resist anymore... Cheers, Wolfgang

On Fri, Apr 09, 2004 at 03:04:16PM +0200, Wolfgang Thaller wrote:
Ross Paterson wrote:
To have something concrete to discuss, I've placed a structure based on Edison at docs: http://www.soi.city.ac.uk/~ross/seq/ code: http://www.soi.city.ac.uk/~ross/seq.tar.gz
Looks nice. It's nice and simple, and I could start using it quickly, without breaking too much of my code. Some naming nitpicks:
Noted, but the names are mostly from Edison, and are just placeholders. The issue for the moment is whether the commonality between sequence implementations should be expressed by a Haskell 98 Sequence class or by a family of sequence modules that all export the same names.
The main difference to Robert's approach, aside from his anti-list propaganda ;-), seems to be that it uses a Haskell98-compatible single-parameter type class. This means that concrete Sequence instances can't add an Ord context later - Are we sure that there are no useful implementations of the Sequence class that require their members to be instances of Ord? I definitely can't think of any, but perhaps someone else can?
Yes, it's assumed that sequences are functors, which seems reasonable to me. It rules out treating unboxed arrays as sequences, but that doesn't seem a priority.
Also, we might really want to add an abstract collection framework later; that will have to use MPTCs, and we will probably want Sequence to be a subclass of that. Also, sooner or later we will move empty and isEmpty to the general Collection class, won't we? Are there many people who will want to use the new framework but stay with plain Haskell98?
The difficulty with trying to be compatible with a future abstract collection framework is that there is no consensus on what that framework should look like. Which is why the recent effort has focussed on concrete data structures. We might as well pick the interface that works best for sequences considered by themselves. It should still be possible to make these things instances of the classes of a Grand Unified Collection Framework whenever it arrives.

On Tue, 13 Apr 2004, Ross Paterson wrote:
The difficulty with trying to be compatible with a future abstract collection framework is that there is no consensus on what that framework should look like.
We just need to agree that the framework will use MPTC. (I think there's no other choice, since Ordered Collections have to put their "Ord a" context somewhere.) Then we can easily make the Sequence a subclass of class Collection later, lifting some functions up. No problem here.
The issue for the moment is whether the commonality between sequence implementations should be expressed by a Haskell 98 Sequence class or by a family of sequence modules that all export the same names.
I add the question: if we have the 'Sequence' class, should we also have a superclass 'Collection' with some operations that are indepent of the explicit element ordering provided by Sequences?
Grand Unified Collection Framework whenever it arrives.
Since we already have a concrete proposal with prototype implementation, that "whenever" may not be so far away. Especially I think that the uncertainities of the Ordered Collections (see section on ByMaps in the documentation), are independent of the design for the Sequence and Collection classes. So I do think that we can already rely on those classes being a good approximate of a future standard, if there will be one. Most notably my design rationale argues that some disadvantages of my proposal (long type signatures, MPTC, restricted laziness...) are necessary to any sensible Collections Framework. So it we don't accept those, we should agree that there will _never_ be such a Framework. Or perhaps anyone can refute my argumentation? Robert

On Fri, 9 Apr 2004, Wolfgang Thaller wrote:
Hmmm... so why should that fact be recorded in the name? Let's keep the names simple. I'd rather rename your 'Collection' class to something like CollectionEq, because
I decided to do that Yesterday. If we could have local constraints à la class Collection coll a where has :: (Eq a) => coll a -> a -> Bool The class wouldn't even be necessary. That would really be simple, but we'll have to see whether it's possible in a "probable sucessor to Haskell'98".
Who says the future _shouldn't_ be compatible with the past?
Of course, it should, but sometimes it can't if it should also be a nice future. We can make better implementations without changing the interface, but we cannot make better interfaces without changing the interface. It's like the change from ML to Haskell, for example.
To use Dessy, I'd either have to change a lot of my program, or I'd have to import it qualified :-(.
First, you can use it on new Modules only. Then you can convert existant Modules one-by-one or in groups (when changing their interface to use Collections). Lastly, most Modules can be converted by simple renaming 'head' to 'first' and so on.
Another problem I have with the names are those names_with_underscores. All the Haskell libraries, and all other Haskell code I've worked with, has always used namesWithoutUnderscores. If you want to replace part of the Prelude with something that used names_with_underscores, prepare for a religious war ;-).
I'm ready to fight it, even ready to loose it, but since underscores are just the better thing (by psychological studies), I'll try to push the change.
Absurdness? Ridiculous? Your language is very strong.
Yes. If I wouldn't think the advantages of a democratic approach are very big, I couldn't defend such a big change. Programmers that are used to all the intricacies and non-consistencies in current tools and libraries don't usually notice how much time they spend on small errors and problems due to this. I think, that _much_ time is lost, and sometimes it even makes big problems because all the small problems distract from the big ones. Of course, the problems are even bigger for beginners who have to learn all this...
Let me say something in favour of lists:
a) They have simple built-in syntax. Changing the [1,2,3] syntax to use a type class will make types even more problematic for beginners...
I think that the "default" mechanism, well-implemented (especially with good error-messages), can compensate for this. You'll have
[1,2,3] :: Seq Int for the beginner and [1,2,3] :: (Sequence seq a, Num a) => seq a for the advanced.
b) [lists] have a very low constant factor[s].
Yes, but other implementations can have that, too. And of course, the "Stream" data type preserves all the good (laziness) properties of lists.
c) They are perfectly suited for many tasks. When I write something like this:
readFile "foo" >>= print . sum . map (product . words) . lines Then lists are *exactly* what I want.
About your Stream module: I don't see the point - I thought you already had an instance for Prelude.[] in your Collections module?
Streams are exactly like the old "lists", just with a more fitting name and without exceptional syntax: Tree a, Set a, Stream a -- that's consistent. Your example makes explicitly use of the "Stream" property of lists, so Streams it what it should use. I explain in section "Transition towards ubiquitous Collections" in the Collections manifest, lists are used for many, many different things, and the new approach separates out all these. Of course that has the disadvantage that one has to think a little bit --what do I really want?-- before choosing an appropriate Collection. But the advantage is better documentation. For a function returning [a] I don't know whether that result is produced lazily or not (i.e. what is the running time of "head" on the result), with abstract Collections (well-used) the result is lazy exactly if it has type "Stream a". (Of course we can still use the Collection's functions without worrying about the concrete type, that decision is taken later -- as it should with separation of concerns.(
Defaults for Sequences: O(logN) for all operations, except O(1) for "atomic queries" (is_empty, first, last, size (or length))
Why are we arguing about "default running times"? It's not like they have any observable effect outside the documentation.
They are very important, because they give developers a feel for what is "expensive" and what is "cheap". When deriving programs from executable specifications, all you do is make them more efficient. That's the essence of programming from a formal point of view. To have the right feel for expensive versus cheap, the model must be as simple as possible so that people think less about performance and more about functionality.
... why shouldn't the running time of operations on other sequences be defined relative to lists?
If every programmer has to learn that 'last' takes time O(N) before he learns that better implementations exist, this will certainly be no good.
In fact, I'd say that I should get the default running times "for free" and different (in many cases, asymptotically better) running times if I am willing to pay a constant factor for that.
Yes, that's one of my design principles. But a more important one is that of the least surprise: the default is the smallest possible worst-case asymptotical running time. Making some operations faster, requires making others slower. This chould be done explicitly so that a user can make sure he doesn't use the more expensive operations in critical places. That's why neither Deques nor Streams are my default.
Sorry for the long post,
I think that such a difficult topic cannot be treated briefly. Your post contained nothing superfluous, so it had to be so long. I hope we can maintain this high standard, even when we get a "religious war"... (I'll try to put all arguments pro and con on the web-page, so we don't need to repeat ourselves...) Robert

I think I'm reluctant to accept Dessy as "the" collection framework mainly because you're trying to sell a big package of many changes at the same time - maybe that's also the reason why no one(?) else publicly said "I want Dessy". Dessy seems to include several new ideas, and I don't want to get them all in one package. At a first glance, Dessy seems to include (forgive me if I forgot something important): 1) a hierarchy of Collection classes, based on MPTC without FunDeps 2) Implementations to go along with it 3) Lots of "propaganda" against lists and a proposal to change the language to remove it's bias towards lists 4) A "Stream" type whose only purpose seems to be to avoid using list syntax when the semantics of singly-linked lists are required 5) Some obvious naming changes like head --> first 6) A blasphemous tendency to use underscores in names :-) People aren't quite done arguing whether the collection classes should rather use fundeps, or whether we should have a single-parameter Sequence class, etc. I guess no one has yet even taken a serious look at your implementations - but if there was anything wrong with the implementations, that could be fixed later. Number 3 is bound to spark some long discussions; and as it's a major (and incompatible) change to the language rather than the libraries, it's not going to happen anytime soon. We need a useful sequence framework that is used by many people first, only then will you be able to convince people that [1,2,3] should denote "a sequence" rather than "a list". Number 4 doesn't make much sense without Number 3. Number 5 definitely makes sense (although I haven't had a look at all of your names yet). It will be painful, though, and there will definitely be many people who just write modules with declarations like "head xs = first xs". Number 6: Veto. (OK, I don't have the right to veto anything, but you know what I mean).
I'm ready to fight it, even ready to loose it, but since underscores are just the better thing (by psychological studies), I'll try to push the change.
Aha, proof by appeal to authority ;-). What psychological studies? Who did they use as test subjects? UNIX C programmers? Windows C programmers? Java programmers? People who have never programmed before? For me, code without underscores is definitely more readable, so it might just depend on what you are accustomed to. Also, when you try to push that change, think about how big a change you are advocating; it's not about but_first versus butFirst only. If the new collection framework is the only place where underscores are used in names, then that's bad, because inconsistency is definitely a bad thing. We'd have to change the other libraries to match (will all the other library maintainers agree to do this?). And I'd have to change my code to match. After I've changed my library calls, my own function names would look out-of-place. Ugh. I think you simply can't "change" the naming style used in Haskell. You can only either "submit" to the predominant style that has brought us relative "harmony" (compared to C), or you can defend the right to use your own style in addition to the predominant style. I like the fact that almost all Haskell code I come accross uses the same naming style, but maybe that's just because it happens to agree with my own preference.
... why shouldn't the running time of operations on other sequences be defined relative to lists?
If every programmer has to learn that 'last' takes time O(N) before he learns that better implementations exist, this will certainly be no good.
I'd agree soon as point 3) above is "accepted" and the "list bias" in the language is removed. As long as we have a language where lists are the standard sequence type that appears everywhere, we can safely assume that every programmer *will* learn how to use lists before he learns about other collection types.
Absurdness? Ridiculous? Your language is very strong.
Yes. If I wouldn't think the advantages of a democratic approach are very big, I couldn't defend such a big change.
Mhm. We seem to have a small clash of cultures here. Down here in old-fashioned Austria, calling somebody's opinion both absurd and ridiculous in the same sentence is sometimes considered equivalent to a personal insult. The mere fact that sensible people have written programs that use lots of singly-linked lists proves that it's neither absurd nor ridiculous. The "democratic" approach might be a lot better, but that doesn't make the old approach ridiculous or absurd. Sorry for complaining about this, I know I shouldn't be that narrow-minded... Cheers, Wolfgang

hi, On Thu, 15 Apr 2004, Wolfgang Thaller wrote:
I think I'm reluctant to accept Dessy as "the" collection framework mainly because you're trying to sell a big package of many changes at the same time
Actually I tried to make clear in the documentation, that you just _don't_ have to accept all the changes at the same time. Expecially, no language changes are obligated, since you can already run it now, in plain old Haskell98 with MPTC. In fact a try to offer at the same time: - a future-combatible approach, ensuring that we don't have to redo it from scratch when we want to add another feature. - a migration path from the present to the future, with the option to fully design the future on the way (e.g. things like the underscores can of course be changed). Let's untangle the aspects:
1) a hierarchy of Collection classes, based on MPTC without FunDeps
The proposed interface standard is independent from implementations (although I tried of course to make many sensible implementations possible).
People aren't quite done arguing whether the collection classes should rather use fundeps, or whether we should have a single-parameter Sequence class, etc.
I think that the argumentation has stopped without finding a solution. The recent proposals (Ross and me) provide new strong arguments. Ordered Structures without MPTC are impossible to do in a practical way (my conjecture, awaiting refutation), so we don't really have a choice. For FunDeps we have the choice and for the moment it is good to know that we can quite well do without. It is to be shown whether we can do better with FunDeps, but then the addition is possible in a compatible way. Sequences can be done without MPTC, but not in the same hierarchy as the Ordered Structures. (Again, my conjecture, the argument is that the implmentation of an Ordered Structure will need an Ord context already when becoming instance of Collection. Like this: instance ( ... Ord a ... ) => Collection SomeSet where (Well, it would work if we didn't use Constructure Classes and use FunDeps, does perhaps anyone want to rewrite the library to check that approach...?))
2) Implementations to go along with it
Actually I reserve the name "Dessy" for the implementations. And due to their special design goals, they are certainly not sufficent for all uses, maybe they shouldn't even be the default in the future.
3) Lots of "propaganda" against lists and a proposal to change the language to remove it's bias towards lists
4) A "Stream" type whose only purpose seems to be to avoid using list syntax when the semantics of singly-linked lists are required
At the moment, these facilities allow users of Collections in a list-less way, which is also combatible with a possible future "list-less Haskell". Of course, "list-less Haskell" is not for today or tomorrow, but it is a vision that one can keep in mind. But again, you needn't buy that vision to use the Collections standard or my implementation.
5) Some obvious naming changes like head --> first
It is not the only truth to dub this a "naming change", since the new functions are quite different (since much more general) from the traditional ones. When speaking of [] lists I still think, it's appropriate to use head, tail, and so on. (Just that I don't think, one should use [] lists in new programs.)
Number 5 definitely makes sense (although I haven't had a look at all of your names yet). It will be painful, though, and there will definitely be many people who just write modules with declarations like "head xs = first xs".
[underscores are better]
What psychological studies?
I need some time to find the references. This gives you a chance to find the studies proving the contrary.
Mhm. We seem to have a small clash of cultures here. Down here in old-fashioned Austria, calling somebody's opinion both absurd and ridiculous in the same sentence is sometimes considered equivalent to a personal insult.
Well, I didn't say that someone's opinion is absurd. Here is a more careful wording of my (still personal and very emotional) belief: "From a viewpoint of abstraction, ignoring all the historic achievments of LISP and its successor languages, the insymmetry of performance of list operations is absurd, and the inconsistent function names are ridiculous." Of course the traditional names are sensible if one knows their history. And of course lists used as streams are important in many applications. But I think anyone that wants to transpose the traditional name and semantics of lists to Abstract Collections has just not thought long enough about it.
Sorry for complaining about this, I know I shouldn't be that narrow-minded...
You aren't. Yet. Haha! Robert
participants (3)
-
Robert Will
-
Ross Paterson
-
Wolfgang Thaller