Proposal: Bounded instance for IntSet (ticket #1953)

I propose to add a Bounded instance to IntSet.hs. IntSet is in Ord, and there are only finitely many instances of IntSet. Therefore there is a min IntSet and a max IntSet. It turns out these bounds are very simple: instance Bounded IntSet where minBound = empty maxBound = singleton maxBound Suggested deadline: December 16, 2007.

On Sun, Dec 02, 2007 at 03:45:40PM -0800, David Benbennick wrote:
I propose to add a Bounded instance to IntSet.hs.
IntSet is in Ord, and there are only finitely many instances of IntSet. Therefore there is a min IntSet and a max IntSet. It turns out these bounds are very simple:
instance Bounded IntSet where minBound = empty maxBound = singleton maxBound
These are the minimum and maximum under the Ord instance (also for Set), but what is the intuition behind that ordering?

On 12/2/07, Ross Paterson
These are the minimum and maximum under the Ord instance (also for Set), but what is the intuition behind that ordering?
The order on IntSet is the well-known lexicographic order (see http://en.wikipedia.org/wiki/Lexicographical_order). If this proposal is accepted, I intend to propose the Bounded instance for Set next (Bounded a => Bounded (Set a)).

On Sun, Dec 02, 2007 at 04:01:05PM -0800, David Benbennick wrote:
On 12/2/07, Ross Paterson
wrote: These are the minimum and maximum under the Ord instance (also for Set), but what is the intuition behind that ordering?
The order on IntSet is the well-known lexicographic order (see http://en.wikipedia.org/wiki/Lexicographical_order).
Yes, but when does one use that ordering on sets?

On Sun, Dec 02, 2007 at 04:01:05PM -0800, David Benbennick wrote:
On 12/2/07, Ross Paterson
wrote: These are the minimum and maximum under the Ord instance (also for Set), but what is the intuition behind that ordering?
The order on IntSet is the well-known lexicographic order (see http://en.wikipedia.org/wiki/Lexicographical_order). If this proposal is accepted, I intend to propose the Bounded instance for Set next (Bounded a => Bounded (Set a)).
This contradicts your code, for the maximal element in the lexicographic order is *not* singleton maxBound, but rather fromList [minBound .. maxBound]. (The libraries agree with your code but not your explanation...) Stefan

On Sun, Dec 02, 2007 at 04:23:11PM -0800, Stefan O'Rear wrote:
On Sun, Dec 02, 2007 at 04:01:05PM -0800, David Benbennick wrote:
On 12/2/07, Ross Paterson
wrote: These are the minimum and maximum under the Ord instance (also for Set), but what is the intuition behind that ordering?
The order on IntSet is the well-known lexicographic order (see http://en.wikipedia.org/wiki/Lexicographical_order). If this proposal is accepted, I intend to propose the Bounded instance for Set next (Bounded a => Bounded (Set a)).
This contradicts your code, for the maximal element in the lexicographic order is *not* singleton maxBound, but rather fromList [minBound .. maxBound].
I think he means lexicographic order of toAscList set. I agree that lexicographic order of toDescList makes more intuitive sense to me, though. Thanks Ian

On Sun, 2 Dec 2007, at 23:53:36 +0000, Ross Paterson wrote:
On Sun, Dec 02, 2007 at 03:45:40PM -0800, David Benbennick wrote:
I propose to add a Bounded instance to IntSet.hs.
IntSet is in Ord, and there are only finitely many instances of IntSet. Therefore there is a min IntSet and a max IntSet. It turns out these bounds are very simple:
instance Bounded IntSet where minBound = empty maxBound = singleton maxBound
These are the minimum and maximum under the Ord instance (also for Set), but what is the intuition behind that ordering?
In my opinion, the class ``Ord'' is not particularly heavy on intuition; it just provides an interface to ``the standard linear ordering'' for each of its types, so that these types can be used for keys in Data.Map's, etc. Not many types are naturally understood as linearly ordered, but most of those types should still be usable for keys in maps. Bounded happens to be defined over Ord, so such instances make sense for whatever uses people want to put Bounded to. Wolfram

Fax +49 - 345 - 55 27033 On Mon, 3 Dec 2007 kahl@cas.mcmaster.ca wrote:
On Sun, 2 Dec 2007, at 23:53:36 +0000, Ross Paterson wrote:
On Sun, Dec 02, 2007 at 03:45:40PM -0800, David Benbennick wrote:
I propose to add a Bounded instance to IntSet.hs.
IntSet is in Ord, and there are only finitely many instances of IntSet. Therefore there is a min IntSet and a max IntSet. It turns out these bounds are very simple:
instance Bounded IntSet where minBound = empty maxBound = singleton maxBound
These are the minimum and maximum under the Ord instance (also for Set), but what is the intuition behind that ordering?
In my opinion, the class ``Ord'' is not particularly heavy on intuition; it just provides an interface to ``the standard linear ordering'' for each of its types, so that these types can be used for keys in Data.Map's, etc.
Not many types are naturally understood as linearly ordered, but most of those types should still be usable for keys in maps.
I already told about by scepticism about using Ord for keys of maps and sets: http://www.haskell.org/pipermail/libraries/2007-April/007411.html

David Benbennick wrote:
I propose to add a Bounded instance to IntSet.hs.
I am opposed to this proposal, unless someone comes up with some important use cases. Unfortunately, there is still no way to control export of instances. So libraries should avoid defining instances unless there is a compelling reason to do so. We need to be reasonably certain that the usefulness of the instance will overwhelm any unforeseen namespace pollution problems that it may cause. In this case, the Ord instance is not really natural; it is defined for technical reasons for use by the library itself (and its friends). The library has no need for a Bounded instance, so why should we prevent people from defining one for some other purpose? -Yitz

On Mon, 3 Dec 2007, Yitzchak Gale wrote:
David Benbennick wrote:
I propose to add a Bounded instance to IntSet.hs.
I am opposed to this proposal, unless someone comes up with some important use cases.
Unfortunately, there is still no way to control export of instances.
You can put instances into a separate module (and get GHC warnings about that :-) But it is not sensible to have different instances for the same type and class, because they will collide sooner or later.
So libraries should avoid defining instances unless there is a compelling reason to do so. We need to be reasonably certain that the usefulness of the instance will overwhelm any unforeseen namespace pollution problems that it may cause.
I think there is no much sense in defining instances privately in code that uses a class definition from a library, because the custom instance in turn may break other modules. In the past it happened for me at each GHC upgrade, that instances that I defined privately (like Show for FiniteMap) collide with new instances defined in the imported standard module.

Henning Thielemann wrote:
...it is not sensible to have different instances for the same type and class, because they will collide sooner or later.
True. That is why libraries should not define an instance at all, unless they are quite certain that it is by far the most important instance that anyone will ever want to use. Here is an example: Control.Monad.Error defines a Monad instance for Either. I understand why that seemed sensible at the time. But the Either type is useful for many other things, too, and now it can *only* be used as an exception type in a monadic setting. If I had to pick just one usage for Either as a monad, it would be as an exit monad. I would use a different name for the exception monad, not the other way around. But now I'm stuck - if I want to use Control.Monad.Error at all, I have to use its crippled monad instance for Either.
I think there is no much sense in defining instances privately in code that uses a class definition from a library, because the custom instance in turn may break other modules. In the past it happened for me at each GHC upgrade, that instances that I defined privately (like Show for FiniteMap) collide with new instances defined in the imported standard module.
OK, so not only should libraries avoid defining instances - users should also think carefully before defining them. Because of this current limitation in Haskell's module system, instances are like the magic in the Sorcerer's Apprentice - very powerful, and unstoppable, for good and for bad. (This is ticket #19 in Haskell', marked as "maybe". Vote for issue 19!) Regards, Yitz

On Mon, 3 Dec 2007, Yitzchak Gale wrote:
Henning Thielemann wrote:
...it is not sensible to have different instances for the same type and class, because they will collide sooner or later.
True. That is why libraries should not define an instance at all, unless they are quite certain that it is by far the most important instance that anyone will ever want to use.
Here is an example: Control.Monad.Error defines a Monad instance for Either. I understand why that seemed sensible at the time. But the Either type is useful for many other things, too, and now it can *only* be used as an exception type in a monadic setting. If I had to pick just one usage for Either as a monad, it would be as an exit monad. I would use a different name for the exception monad, not the other way around. But now I'm stuck - if I want to use Control.Monad.Error at all, I have to use its crippled monad instance for Either.
I think the problem here is, that library designers wanted to save work by (ab)using Either for errors. They should have defined an Error type which shares similarities with Either but is a distinct type and its occurence tells the reader that it is not about arbitrary alternatives but about error handling. What concerns instances - after I managed to understand the "orphan instance" warnings of GHC I learnt to like GHC's rule of warning about instances: Instances should be defined in the module where the class or the instance type is defined. If I import a standard type and a standard class with a natural instance declaration I expect that this instance already exists. E.g. in GHCi-6.4.1 Prelude> Test.QuickCheck.test (\xl yl -> let x = Data.Set.fromList (xl::[Int]); y = Data.Set.fromList yl in Data.Set.union x y == Data.Set.union y x) Loading package QuickCheck-1.0 ... linking ... done. OK, passed 100 tests. This small example already uses several predefined instance: Ord instance for Int, Eq instance for Data.Set.Set, Arbitrary for Int and List, Show instances for them. The first two are rather straightforward, the second two are a somehow arbitrary, although useful. Of course I like to write Prelude> Test.QuickCheck.test (\x y -> Data.Set.union x y == Data.Set.union y (x::Data.Set.Set Int)) but <interactive>:1:0: No instance for (Test.QuickCheck.Arbitrary (Data.Set.Set Int)) arising from use of `Test.QuickCheck.test' at <interactive>:1:0-19 Probable fix: add an instance declaration for (Test.QuickCheck.Arbitrary (Data.Set.Set Int)) In the definition of `it': it = Test.QuickCheck.test (\ x y -> (Data.Set.union x y) == (Data.Set.union y (x :: Data.Set.Set Int))) But I think, since List is an instance of Arbitrary, Set can also be one, based on the List instance.

On Mon, Dec 03, 2007 at 03:32:52PM +0200, Yitzchak Gale wrote:
Henning Thielemann wrote:
...it is not sensible to have different instances for the same type and class, because they will collide sooner or later.
True. That is why libraries should not define an instance at all, unless they are quite certain that it is by far the most important instance that anyone will ever want to use.
I would draw the opposite conclusion from the same data: if a sensible instance can be identified, it should accompany either the class or the type constructor. If people define orphan instances, they will eventually collide, even they are identical. (There are a few orphans in Control.Monad.Instances, but that is required to preserve compatibility with Haskell 98.) But in the case at issue, the proposed Bounded instance is counter-intuitive because the underlying Ord instance is. That Ord instance is an arbitrary choice that is accepted because it allows IntSets to be used as search keys; it makes no sense on its own. A Bounded instance would attach unwarranted significance to this arbitrary ordering. It might be different if there were a use in sight for the Bounded instance.

On 12/3/07, Ross Paterson
But in the case at issue, the proposed Bounded instance is counter-intuitive because the underlying Ord instance is. That Ord instance is an arbitrary choice that is accepted because it allows IntSets to be used as search keys; it makes no sense on its own.
I don't find that to be the case. If you had asked me to independently come up with an ordering on IntSets, the existing ordering is exactly what I would have invented. As I said earlier, lexicographic order is very well known. It's arbitrary, but it's a universally-agreed arbitrary. It's how words are ordered in paper dictionaries, for example. (To be precise, to compare two IntSets, you convert them to lists with toList, then compare the lists with lexicographic order.)

On Mon, Dec 03, 2007 at 05:56:50PM -0800, David Benbennick wrote:
On 12/3/07, Ross Paterson
wrote: But in the case at issue, the proposed Bounded instance is counter-intuitive because the underlying Ord instance is. That Ord instance is an arbitrary choice that is accepted because it allows IntSets to be used as search keys; it makes no sense on its own.
I don't find that to be the case. If you had asked me to independently come up with an ordering on IntSets, the existing ordering is exactly what I would have invented. As I said earlier, lexicographic order is very well known. It's arbitrary, but it's a universally-agreed arbitrary. It's how words are ordered in paper dictionaries, for example.
(To be precise, to compare two IntSets, you convert them to lists with toList, then compare the lists with lexicographic order.)
I would have used descending order; so it's not *completely* universal. Stefan

On 12/3/07, Stefan O'Rear
I would have used descending order; so it's not *completely* universal.
Well, using ascending order agrees with "show" and "toList". But you do have a point. If Ord used descending order, then (isSubsetOf a b) would imply (a <= b). With the existing Ord instance, (isSubsetOf a b) implies nothing about (compare a b).

On 2007-12-04, Ross Paterson
On Mon, Dec 03, 2007 at 03:32:52PM +0200, Yitzchak Gale wrote:
Henning Thielemann wrote:
...it is not sensible to have different instances for the same type and class, because they will collide sooner or later.
True. That is why libraries should not define an instance at all, unless they are quite certain that it is by far the most important instance that anyone will ever want to use.
I would draw the opposite conclusion from the same data: if a sensible instance can be identified, it should accompany either the class or the type constructor. If people define orphan instances, they will eventually collide, even they are identical. (There are a few orphans in Control.Monad.Instances, but that is required to preserve compatibility with Haskell 98.)
I strongly concur. -- Aaron Denney -><-

On 12/3/07, Yitzchak Gale
In this case, the Ord instance is not really natural; it is defined for technical reasons for use by the library itself (and its friends). The library has no need for a Bounded instance, so why should we prevent people from defining one for some other purpose?
Note that in one sense, people are already prevented from defining a different Bounded instance. Any Bounded instance other than the one suggested in this proposal would fail to obey the axioms of the Bounded class. In other words, there is a unique largest element, and a unique smallest element, an no one can legitimately define a different Bounded instance. Given that the Bounded instance is uniquely defined, I think it is more convenient to have it defined by the library than to have to manually define it in every program that uses it.

Yitzchak Gale wrote:
In this case, the Ord instance is not really natural; it is defined for technical reasons for use by the library itself (and its friends). The library has no need for a Bounded instance, so why should we prevent people from defining one for some other purpose?
David Benbennick wrote:
Note that in one sense, people are already prevented from defining a different Bounded instance. Any Bounded instance other than the one suggested in this proposal would fail to obey the axioms of the Bounded class. In other words, there is a unique largest element, and a unique smallest element, an no one can legitimately define a different Bounded instance.
Where are these axioms? I only see the Haddocks in the Prelude: "Ord is not a superclass of Bounded since types that are not totally ordered may also have upper and lower bounds." I think that is the case here. There is no unique notion of order for this type. The Ord instance only exists so that you can insert these things into containers. So Ord is used up now for this type, a pity. Why should the fact that we want to be able to use containers force us to clobber the Bounded class as well? I think that Henning's point is well-taken. This use of Ord in the Data.<container> series has effectively changed the semantics of Ord. If a type has an Ord instance, it might mean that the type is ordered, but it also might mean only that the type has been enabled for convenient use with containers. If we allow the use of an alternative class for key indexing for types that are not naturally ordered, as Henning suggests, then we will be able to impose a relationship between Bounded and Ord as you suggest. Regards, Yitz

On 12/3/07, Yitzchak Gale
David Benbennick wrote:
Note that in one sense, people are already prevented from defining a different Bounded instance. Any Bounded instance other than the one suggested in this proposal would fail to obey the axioms of the Bounded class. In other words, there is a unique largest element, and a unique smallest element, an no one can legitimately define a different Bounded instance.
Where are these axioms? I only see the Haddocks in the Prelude:
"Ord is not a superclass of Bounded since types that are not totally ordered may also have upper and lower bounds."
See http://haskell.org/ghc/docs/latest/html/libraries/base-3.0.0.0/Prelude.html#... The first sentence there says "The Bounded class is used to name the upper and lower limits of a type". In other words, you can't just pick any old values for minValue and maxValue.

Yitzchak Gale wrote:
Unfortunately, there is still no way to control export of instances.
I have some modules that do nothing but export instances --- I even make the empty export list explicit...
In this case, the Ord instance is not really natural; it is defined for technical reasons for use by the library itself (and its friends). The library has no need for a Bounded instance, so why should we prevent people from defining one for some other purpose?
Whereever this Ord instance is in scope, a Bounded instance has to conform with that. It even makes sense to define the Bounded instance for the explicit purpose of preventing people to roll their own semantically inconsistent Bounded instances, especially since the correct instance is obviously somewhat non-obvious ;-) One could of course consider to put both instances into a separate module Data.IntMap.Ord, so you can choose not to import that and roll your own instead. Wolfram

On Sun, Dec 02, 2007 at 03:45:40PM -0800, David Benbennick wrote:
I propose to add a Bounded instance to IntSet.hs.
IntSet is in Ord, and there are only finitely many instances of IntSet. Therefore there is a min IntSet and a max IntSet. It turns out these bounds are very simple:
instance Bounded IntSet where minBound = empty maxBound = singleton maxBound
This seems fairly unintuitive me. the natural choices of minimum and maximum bounds for a set would seem to be singleton minBound vs singleton maxBound or empty vs universal (fromList [minBound .. maxBound]) set the odd combination of the two proposed just feels off to me. John -- John Meacham - ⑆repetae.net⑆john⑈

On 12/2/07, Ross Paterson
Yes, but when does one use that ordering on sets?
IntSet uses that ordering.
On 12/2/07, John Meacham
This seems fairly unintuitive me. the natural choices of minimum and maximum bounds for a set would seem to be
singleton minBound vs singleton maxBound
or
empty vs universal (fromList [minBound .. maxBound]) set
the odd combination of the two proposed just feels off to me.
There is no choice. Given the existing ordering on IntSets, there is a unique smallest element, and a unique largest element. There is a legitimate discussion to be had about what ordering to use on IntSets, but that discussion is not relevant to this proposal. No matter what ordering you pick, IntSet will be bounded, so should be in class Bounded. To repeat: *** This proposal is not about changing the ordering of IntSets. It is only about adding the Bounded instance that is determined by the existing ordering. ***
participants (9)
-
Aaron Denney
-
David Benbennick
-
Henning Thielemann
-
Ian Lynagh
-
John Meacham
-
kahl@cas.mcmaster.ca
-
Ross Paterson
-
Stefan O'Rear
-
Yitzchak Gale