Re: Who needs Ord for Sets and Maps anyway?

Adrian Hey wrote:
Well, speaking of "non-trivial comparisons" reminds me of something that's been worrying me about this whole approach to implementing Sets and Maps. I.E. Requiring that element/key types are instances of Ord and assuming that lookup etc is achieved by performing some kind of binary search.
This seems wrong in general. [...]
and
This also highlights another reason why IMO specifying "biasing" is bad, because doing this implicitly assumes that Sets/Maps somehow contain concrete structural representations of elements/keys. This is not necessarily so. It's not true with any implementation like that described above (or even simple tries).
You've just highlighted why the collections hierarchy in Edison was a lattice of 8 classes. Basically, there are two choices in each of three different dimensions: 1. The set/map distinction 2. Require Ord or don't (your first point above) 3. "Observable" or not (your second point above) I think the last point is the one that caused the most confusion, but it is exactly the issue you highlight above -- some structures contain the actual elements/keys, but some (such as tries or sets based on bitmaps) don't. The non-observable parts of the hierarchy are there precisely to support implementations like tries and bitmaps where it is not easy to get your hands on the actual elements. -- Chris

Hello Chris,
I think the last point is the one that caused the most confusion, but it is exactly the issue you highlight above -- some structures contain the actual elements/keys, but some (such as tries or sets based on bitmaps) don't. The non-observable parts of the hierarchy are there precisely to support implementations like tries and bitmaps where it is not easy to get your hands on the actual elements.
Yes, the more I think about this the more difficult it seems to produce abstract, polymorphic and efficent Sets/Maps. The Ord/Binary tree approach really does seem a bit naive to me now (even if it does use AVL trees :-) But realistic alternatives seem to require the use of language features that are non-standard (and mostly beyond my grasp at the moment anyway, but I've never really looked at them seriously). We can produce specialised implementations for particular types and include them in standard libs. But this still doesn't address the real problem (well I think it's the real problem) of how users can easily produce implementations for their own (non-standard) types. Just deriving Ord won't be good enough. Is Edison still being maintained, BTW? I wasn't even aware that is was distributed (with GHC least) until quite recently. (It seems a pity to have it languishing in obscurity under hslibs.) Regards -- Adrian Hey

G'day all.
Quoting Adrian Hey
Is Edison still being maintained, BTW? I wasn't even aware that is was distributed (with GHC least) until quite recently. (It seems a pity to have it languishing in obscurity under hslibs.)
I adopted it for a while, but stopped actively maintaining it because discussion on this list seemed to indicate that nobody really wanted it. Some day I'm going to cabalise it at least, so that should at least make it less obscure. Cheers, Andrew Bromage

On 1/19/06, ajb@spamcop.net
G'day all.
Quoting Adrian Hey
: Is Edison still being maintained, BTW? I wasn't even aware that is was distributed (with GHC least) until quite recently. (It seems a pity to have it languishing in obscurity under hslibs.)
I adopted it for a while, but stopped actively maintaining it because discussion on this list seemed to indicate that nobody really wanted it.
Some day I'm going to cabalise it at least, so that should at least make it less obscure.
Well, for the record. I think the lack of good "standardised" data structures is one of the main problems with Haskell (and one of the easiest to fix). So if you do have the incliniation and time to work on it, I know I'd sure apreciate it! A good "standard" class hierarchy for collections and several implementations is very much needed, IMO. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

G'day all.
Quoting Sebastian Sylvan
A good "standard" class hierarchy for collections and several implementations is very much needed, IMO.
For the record, the consensus seemed to be that we needed something, but Edison wasn't it. One problem that I got bogged down in was the idea that some data structures most sensibly reside in monads (e.g. IOArrays) and some do not. A consistent interface for both proved elusive. Cheers, Andrew Bromage

Hello ajb, Thursday, January 19, 2006, 4:47:28 AM, you wrote:
A good "standard" class hierarchy for collections and several implementations is very much needed, IMO.
asn> One problem that I got bogged down in was the idea that some data structures asn> most sensibly reside in monads (e.g. IOArrays) and some do not. A consistent asn> interface for both proved elusive. at least, arrays library have two completely different interfaces for immutable and mutable arrays, although these interfaces has much in common and moreover MArray in IO monad can be used to create "differential" IArray. i think that it is the very good design that just need to be entirely copied for general collection classes -- Best regards, Bulat mailto:bulatz@HotPOP.com

Quoting Sebastian Sylvan
: A good "standard" class hierarchy for collections and several implementations is very much needed, IMO.
Yes, yes, and yes. I think the Java collections framework http://java.sun.com/docs/books/tutorial/collections/index.html is quite brilliant (regarding both the implementations and the interface design) and I wonder what keeps us from copying it as literally as possible. This is not a rhetorical question. They solve the problem of what comparison method to use for the elements by providing constructors that have a Comparator<E> object as argument. What's wrong with that? (Does it solve the problems that were discussed here recently?) I fear (or I hope) that the average well-trained Java programmer is way ahead of the average Haskell programmer when using data structures *and* hiding them behind interfaces. I've seen too many Haskell sources (including my own) that use concrete collection/map types (Data.List, Data.Map) all over the place where in fact interfaces (see Collection<E>, List<E> etc.) would be the right thing. In Haskell, we would need existential types to express that a function returns "an object of *some* type that implements the (hypothetical) Set interface". I guess the notational extra work for that is the main reason (at least for me) for wrongly preferring concrete datatypes over abstract types in Haskell. PS (I know this is heresy but - can we please in Haskell-0X rename "class" to "interface" so that the non-Haskell world knows what we're talking about. You know if the mountain's not gonna walk to the prophet, then ...) (note that I'm currently not proposing to rename "data" to "class". But looking at GADT data definitions still makes me wonder...) -- -- Johannes Waldmann -- Tel/Fax (0341) 3076 6479/80 -- ---- http://www.imn.htwk-leipzig.de/~waldmann/ -------

On 1/19/06, Johannes Waldmann
Quoting Sebastian Sylvan
: A good "standard" class hierarchy for collections and several implementations is very much needed, IMO.
Yes, yes, and yes. I think the Java collections framework http://java.sun.com/docs/books/tutorial/collections/index.html is quite brilliant (regarding both the implementations and the interface design) and I wonder what keeps us from copying it as literally as possible. This is not a rhetorical question.
A lot of things differ between haskell and java. The type system, the undelying paradigms are different. We can use the java collections as a source of inspiration, but merely copying is impractical.
They solve the problem of what comparison method to use for the elements by providing constructors that have a Comparator<E> object as argument. What's wrong with that? (Does it solve the problems that were discussed here recently?)
What if we do the union of two Sets constructed with different comparators ?
I fear (or I hope) that the average well-trained Java programmer is way ahead of the average Haskell programmer when using data structures *and* hiding them behind interfaces. I've seen too many Haskell sources (including my own) that use concrete collection/map types (Data.List, Data.Map) all over the place where in fact interfaces (see Collection<E>, List<E> etc.) would be the right thing.
Haskell has a long tradition of using concrete lists. Breaking from that tradition involves quite a lot of problems. If you wish, I suggest to look at Robert Will's "Dessy", which implements something close to that.
In Haskell, we would need existential types to express that a function returns "an object of *some* type that implements the (hypothetical) Set interface". I guess the notational extra work for that is the main reason (at least for me) for wrongly preferring concrete datatypes over abstract types in Haskell.
If I understand correctly, those are unrelated issues. We can parameterize over the collection type using the usual Haskell mechanisms (no need for subtyping).
PS (I know this is heresy but - can we please in Haskell-0X rename "class" to "interface" so that the non-Haskell world knows what we're talking about. You know if the mountain's not gonna walk to the prophet, then ...) (note that I'm currently not proposing to rename "data" to "class". But looking at GADT data definitions still makes me wonder...)
Haskell is just different; if one can't get beyond lexical issues, there's IMHO no hope to make the paradigm shift needed to write good haskell. Cheers, JP.

Jean-Phillipe Bernardy wrote:
What if we do the union of two Sets constructed with different comparators ?
The union of two ordered sets with different specifactions for their (externally visible) structure (ordering) is not defined. Or rather, it is up to the user to specify explicitely what he wants (i. e. name the comparator for the result). Or do you mean that an implementation of "union" could be more efficient if both sets have the same internal representation (e. g. balanced tree). A Java implementation would probably use "instanceof" to check for that. With the current framework and Sun's implementation, the TreeSet implementation (Red-Black trees) does not have a specialiced implementation of "addAll", as far as I could see. The underlying question is: what should be the type of "union". Java: interface Collection<E> { addAll(Collection extends E> c) } naive Haskell: class Collection c where union :: c e -> c e -> c e The Java thing is existential: each implementation of addAll has to accept *any* collection as an argument, while the Haskell implementation knows that both arguments have identical representation. So the Java version is more flexible for the user of the library. One could try class Collection c where union :: Collection d => c e -> d e -> c e and use a specialized implementation for c e -> c e -> c e. This works for arguments of functions, but what about a function that by design returns "any" collection, without telling the caller about the implementation. Then we sure have to wrap it up in an existential type? My point is that hiding the implementation is actually the recommended coding style (not just for Java) but it is syntactically inconvenient in Haskell. But perhaps I missed something. Best regards, -- -- Johannes Waldmann -- Tel/Fax (0341) 3076 6479/80 -- ---- http://www.imn.htwk-leipzig.de/~waldmann/ -------

On 1/19/06, Johannes Waldmann
The union of two ordered sets with different specifactions for their (externally visible) structure (ordering) is not defined. Or rather, it is up to the user to specify explicitely what he wants (i. e. name the comparator for the result).
Or do you mean that an implementation of "union" could be more efficient if both sets have the same internal representation (e. g. balanced tree). A Java implementation would probably use "instanceof" to check for that. With the current framework and Sun's implementation, the TreeSet implementation (Red-Black trees) does not have a specialiced implementation of "addAll", as far as I could see.
The underlying question is: what should be the type of "union". Java: interface Collection<E> { addAll(Collection extends E> c) } naive Haskell: class Collection c where union :: c e -> c e -> c e The Java thing is existential: each implementation of addAll has to accept *any* collection as an argument, while the Haskell implementation knows that both arguments have identical representation. So the Java version is more flexible for the user of the library. One could try class Collection c where union :: Collection d => c e -> d e -> c e and use a specialized implementation for c e -> c e -> c e.
I see your point now. However, I suspect the homogeneous typing is less surprising for the user. It is also much more inline with the haskell style. See for example, Num class: (+) :: a -> a -> a
This works for arguments of functions, but what about a function that by design returns "any" collection, without telling the caller about the implementation. Then we sure have to wrap it up in an existential type? My point is that hiding the implementation is actually the recommended coding style (not just for Java) but it is syntactically inconvenient in Haskell. But perhaps I missed something.
Subtyping and binary methods is not something so obvious... And certainly I do not want to emulate the java solution with existential types :) In any case, please don't confuse implementation hiding and polymorphism. Those are mixed in most OO languages, and indeed in java, but separated in haskell. For example, Data.Set is "monomorphic", but still an abstract data type (one cannot observe the internal structure, and indeed we are considering changing it) Cheers, JP.

G'day all.
Quoting Jean-Philippe Bernardy
It is also much more inline with the haskell style. See for example, Num class: (+) :: a -> a -> a
A lot of people don't like this type, especially for multiplication. It makes implementing types based on vector spaces much more complicated, for example. Cheers, Andrew Bromage

On 19/01/06, ajb@spamcop.net
G'day all.
Quoting Jean-Philippe Bernardy
: It is also much more inline with the haskell style. See for example, Num class: (+) :: a -> a -> a
A lot of people don't like this type, especially for multiplication. It makes implementing types based on vector spaces much more complicated, for example.
Well, obviously it would be nice to be able to do that, but has anyone figured out *how* yet? This sounds kinda like how we haven't got a collections framework yet because no one has figured out how to type it nicely... wait, isn't that what we were talking about to start with?

Hello ajb, Friday, January 20, 2006, 2:59:47 AM, you wrote:
It is also much more inline with the haskell style. See for example, Num class: (+) :: a -> a -> a
asn> A lot of people don't like this type, especially for multiplication. It and for pointer arithmetics too :) -- Best regards, Bulat mailto:bulatz@HotPOP.com

Jean-Phillipe Bernardy wrote:
... For example, Data.Set is "monomorphic", but still an abstract data type (one cannot observe the internal structure, and indeed we are considering changing it)
but I cannot easily choose another representation because I would have to change my program (replace Set by AVLSet or something) everywhere. I sometimes (reluctantly) use a detour by declaring a wrapper type http://141.57.11.163/cgi-bin/cvsweb/lib/Autolib/FiniteMap.hs?rev=1.12 but this is an ugly workaround and the root of the problem is that there is no defined interface for Set implementations. Cale Gibbard wrote:
[in Java] classes implement some interfaces, but only one interface at a time can be mentioned when it comes to using them.
Right. Although you can always write "interface A extends B, C {}" and then use A for "B and C". But I agree that this is inconvenient.
interfaces in Java are not permitted to provide default implementations of methods which they declare
If you want that, then you want an "abstract class", e. g. http://java.sun.com/j2se/1.5.0/docs/api/java/util/AbstractSet.html I think the sole purpose of an interface is to specify a contract. Obviously the Java design choice was to not mix this with providing help when fulfilling the contract. Best regards, -- -- Johannes Waldmann -- Tel/Fax (0341) 3076 6479/80 -- ---- http://www.imn.htwk-leipzig.de/~waldmann/ -------

On 19/01/06, Johannes Waldmann
PS (I know this is heresy but - can we please in Haskell-0X rename "class" to "interface" so that the non-Haskell world knows what we're talking about. You know if the mountain's not gonna walk to the prophet, then ...) (note that I'm currently not proposing to rename "data" to "class". But looking at GADT data definitions still makes me wonder...)
Typeclasses are about as much unlike Java interfaces as they are unlike Java classes. Typeclasses are not types, like interfaces are in Java (at least, when I learned Java, I'll admit I haven't been following progress too closely). Typeclasses (at least in modern Haskell), are allowed to have arbitrarily many type parameters as well as functional dependencies between those parameters, whereas interfaces in Java are something that each class is permitted to implement, but they do not tie multiple classes together. Further, functions in Haskell are permitted to require that some types involved satisfy any number of classes, whereas with interfaces, classes implement some interfaces, but only one interface at a time can be mentioned when it comes to using them. For example, in Haskell, I can write: f :: (Ord a, Read a) => String -> String -> a f xs ys = max (read xs) (read ys) Whereas in Java, this kind of requirement that a type implement multiple interfaces seems harder to specify (correct me if I'm wrong, it's been a few years since I wrote any Java code). If I recall correctly, interfaces in Java are not permitted to provide default implementations of methods which they declare (and that's one of the major things separating them from classes), whereas in Haskell, we are permitted to define default method implementations in classes, which can be overruled in the instances. In any event, I think it's dangerous when coming to a new language to expect that the terms mean the same thing at all. "Class" means something different to a Haskell programmer than a Java programmer, and that's fine. So do most other words. :) Haskell isn't claiming to be an OO language, so we don't have to align with OO terminology. That said, 'interface' or perhaps 'predicate' or 'relation' seem like slightly better words than 'class' here, purely from an English language perspective, but the term 'class' does have a justification, and I think it's already blatantly obvious to new Haskell programmers that Haskell classes are not too much like OO classes, even if the terminology is similar. Is the small benefit really worth the trouble of altering all the literature and code to change the name? - Cale

On 19/01/06, Cale Gibbard
That said, 'interface' or perhaps 'predicate' or 'relation' seem like slightly better words than 'class' here, purely from an English language perspective, but the term 'class' does have a justification, and I think it's already blatantly obvious to new Haskell programmers that Haskell classes are not too much like OO classes, even if the terminology is similar. Is the small benefit really worth the trouble of altering all the literature and code to change the name?
Even worse, we'd also have to change all the jokes and t-shirt designs!

Hello Cale, Thursday, January 19, 2006, 6:56:42 PM, you wrote:
PS (I know this is heresy but - can we please in Haskell-0X rename "class" to "interface" so that the non-Haskell world knows what we're
CG> terminology is similar. Is the small benefit really worth the trouble CG> of altering all the literature and code to change the name? on the other side, it will allow us to sell all those books second time ;) -- Best regards, Bulat mailto:bulatz@HotPOP.com

On Wed, Jan 18, 2006 at 09:16:00AM +0000, Adrian Hey wrote:
We can produce specialised implementations for particular types and include them in standard libs. But this still doesn't address the real problem (well I think it's the real problem) of how users can easily produce implementations for their own (non-standard) types. Just deriving Ord won't be good enough.
It would be interesting to use DrIFT to produce generalized Tries from arbitrary abstract data types. It seems like it should be possible. is there any literature on this sort of thing? It can probably be done (less efficiently) with Data.Generics too. John -- John Meacham - ⑆repetae.net⑆john⑈

Hello John, Thursday, January 19, 2006, 4:12:00 AM, you wrote: JM> It would be interesting to use DrIFT to produce generalized Tries from JM> arbitrary abstract data types. It seems like it should be possible. is JM> there any literature on this sort of thing? It can probably be done JM> (less efficiently) with Data.Generics too. and also with Template Haskell. when i was interested in generic programmimg with Haskell, i found 7 projects, which can be used for it: drift -fgenerics TH generic haskell polyp SYB strafunski -- Best regards, Bulat mailto:bulatz@HotPOP.com

Hello On Thursday 19 Jan 2006 1:12 am, John Meacham wrote:
It would be interesting to use DrIFT to produce generalized Tries from arbitrary abstract data types.
I'm not sure what DrIFT actually does, but it sounds like what you're suggesting should be interesting. I was thinking it wouldn't be to hard to write a program that took any Haskell data type definition (call it MyType) and automatically emitted MyTypeSet and MyTypeMap generalised trie types and associated functions and class instances. Of course the problem comes in that having created these types it should go on to create MyTypeSetSet,MyTypeSetMap,MyTypeMapSet, MyTypeMapMap..Hmm :-) Regards -- Adrian Hey

Hello Adrian, I've found some time to try out (part of) your library (below). And it speeded up my whole application for about 5%! I've used the following "interface" to simulate Data.Map and was a bit uncertain if my construction of COrdings were all correct. (It's easy to exchange variables being compared, isn't it.) Cheers Christian import Data.Tree.AVL as AVL import Data.COrdering import Prelude hiding (null, lookup) type Map a b = AVL (a, b) empty :: Map a b empty = AVL.empty insert :: Ord a => a -> b -> Map a b -> Map a b insert k v m = AVL.pushWith (\ (a, _) (c, _) -> case compare a c of LT -> Lt EQ -> Eq (k, v) GT -> Gt) (k, v) m null :: Map a b -> Bool null = AVL.isEmpty lookup :: Ord a => a -> Map a b -> Maybe b lookup k m = AVL.genTryRead m (\ (a, b) -> case compare k a of LT -> Lt EQ -> Eq b GT -> Gt) findWithDefault :: Ord a => b -> a -> Map a b -> b findWithDefault d k = maybe d id . lookup k insertWith :: Ord a => (b -> b -> b) -> a -> b -> Map a b -> Map a b insertWith f k v m = AVL.pushWith (\ (a, b) (c, d) -> case compare a c of LT -> Lt EQ -> Eq (k, f b d) GT -> Gt) (k, v) m elems :: Ord a => Map a b -> [b] elems m = map snd $ AVL.asListL m

Hello Christian On Thursday 19 Jan 2006 9:38 pm, Christian Maeder wrote:
I've found some time to try out (part of) your library (below). And it speeded up my whole application for about 5%!
Thanks. The current AVL implementation isn't specialised for Maps (you get an extra indirection overhead by using pairs), but AVL does seem to more or less hold it's own against Data.Map despite this handicap for basic operations like insert and lookup. Where I think you should see significant improvement is in tree balancing if you end up growing a tree from a sorted list (that you don't know is sorted) and in Set operations (union etc). The Hedge algorithm seems to require many more comparisons. Of course how much this contributes to overall program speedup depends on a lot of things (what else program is doing, how much comparison costs..)
insert :: Ord a => a -> b -> Map a b -> Map a b insert k v m = AVL.pushWith (\ (a, _) (c, _) -> case compare a c of LT -> Lt EQ -> Eq (k, v) GT -> Gt) (k, v) m
I deprecated pushWith. The Data.Map clone I started (but haven't got around to finishing) has insert defined thus.. newtype Map k a = Map (AVL.AVL (k,a)) insert :: Ord k => k -> a -> Map k a -> Map k a insert k a (Map t) = Map (AVL.genPush cc (k,a) t) where cc (k',_) = case compare k k' of LT -> COrdering.Lt EQ -> COrdering.Eq (k',a) GT -> COrdering.Gt Here's my versions of some of the other functions (not that I think there's anything wrong with yours).
lookup :: Ord a => a -> Map a b -> Maybe b lookup k m = AVL.genTryRead m (\ (a, b) -> case compare k a of LT -> Lt EQ -> Eq b GT -> Gt)
findWithDefault :: Ord a => b -> a -> Map a b -> b findWithDefault d k = maybe d id . lookup k
readValCC :: Ord k => k -> (k,a) -> COrdering.COrdering a readValCC k (k',a) = case compare k k' of LT -> COrdering.Lt EQ -> COrdering.Eq a GT -> COrdering.Gt lookup :: (Monad m,Ord k) => k -> Map k a -> m a lookup k (Map t) = case AVL.genTryRead t (readValCC k) of Just a -> return a Nothing -> fail "AvlMap.lookup: Key not found" findWithDefault :: Ord k => a -> k -> Map k a -> a findWithDefault def k (Map t) = AVL.genDefaultRead def t (readValCC k)
insertWith :: Ord a => (b -> b -> b) -> a -> b -> Map a b -> Map a b insertWith f k v m = AVL.pushWith (\ (a, b) (c, d) -> case compare a c of LT -> Lt EQ -> Eq (k, f b d) GT -> Gt) (k, v) m
insertWith :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a insertWith f k a (Map t) = Map (AVL.genPush cc (k,a) t) where cc (k',a') = case compare k k' of LT -> COrdering.Lt EQ -> COrdering.Eq (k',f a a') GT -> COrdering.Gt Regards -- Adrian Hey
participants (11)
-
Adrian Hey
-
ajb@spamcop.net
-
Bulat Ziganshin
-
Cale Gibbard
-
Christian Maeder
-
Jean-Philippe Bernardy
-
Johannes Waldmann
-
John Meacham
-
Okasaki, C. DR EECS
-
Samuel Bronson
-
Sebastian Sylvan