
Hello apfelmus, Thanks for taking the time to look at this. As what Jamie has posted is largely based on my own initial efforts I can offer some insight about what's going on here and why the class API looks like it does. The first thing to note is that what Jamie has posted is the proposed class methods *only*. It's not the complete user level map API. There's also a whole lot of more convenient and sensible looking functions that are just regular overloaded functions. Unfortunately these are not visible in the API posted. Among them are things like this.. size :: GT map k => map a -> Int size mp = ASINT(addSize mp L(0)) (ASINT and L are a cpp macros) elemsAscending :: GT map k => map a -> [a] elemsAscending mp = foldrElemsAscending (:) mp [] assocsAscending :: GT map k => map a -> [(k,a)] assocsAscending mp = foldrAssocsAscending (\k a assocs -> (k,a):assocs) mp [] keysAscending :: GT map k => map a -> [k] keysAscending mp = foldrKeysAscending (:) mp [] The second thing to note is that the class API has been designed with the *implementation* of generalised tries in mind. It's not necessary that any instance is actually implemented as any kind of trie, but it is necessary that the resulting API contains what's needed to enable its use in other instances that are based on generalised tries. So the actual class methods chosen are designed to be convenience for generalised trie implementation, not typical map users. The types are chosen to reflect how they will likely be used in other generalised trie implementations and the functionality provided is what it seems is actually needed to do this efficiently. What the class API contains has kind of evolved empirically with stuff being added as and when it was discovered it was needed (by implementing a few common instancance by hand). Here's my thoughtS about your specific observations.. apfelmus wrote:
1) Some terminology seems queer to me. In particular, I don't like the terms "assoc" and "association" in the documentation. The Data.Map documentation uses "key/value pair" or just "key" and "value"/"element" instead, which I think is much better. For instance:
"Insert a new association in the map" :( "Insert a new key and value in the map." :)
Also, I like fromList better than fromAssocs .
Well this is one of those bike shed arguments :-) I'm easy about it so if Jamie agrees with you that's fine. Use of the term "association" seems quite common. I've seen numereous uses of the term "association list", never seen anyone talk about "key/value pair list".
2) Don't use Int# , looks like a premature optimization to me. Furthermore, I'd change the queer addSize to simply
size :: map a -> Int
This does exist (see above), but it's not a class method. One could argue that is was unnecessary to make unboxing explicit. It's something I got into the habit of doing because it's way easier to do that than is inspecting ghc's output to make it's done it on its own (and figuring out what to do about it if it hasn't). Also, because of the nested nature generalised tries addSize is more convenient for implementors than size IMO.
3) insert is strange. Why not use the convention in Data.Map, name it insertWith and have
insert :: k -> a -> map a -> map a instead?
Yes, this this should probably be added to the user API. As a choice of primitive class method, the current form seems more appropriate (though perhaps not it's name). All the common user level variants can easily be implemented with this primitive.
Similar for union .
Actually I don't like Data.Maps union much. It would be deprecated if I ruled the world. I think users should be always be made to specify how overlapping values are to be combined (or discarded or whatever). So the name union is now free we don't need to qualify the function that is explicit about this using a "With" suffix.
4) Most functions, in particular the ..Maybe variants have fixed default definitions. I wouldn't include them in the class. How about a minimal class interface along the lines of
class GMap map k | map -> k where empty :: map a null :: map a -> Bool
lookup :: k -> map a -> Maybe a alter :: (Maybe a -> Maybe a) -> k -> map a -> map a merge :: (k -> Maybe a -> Maybe b -> Maybe c) -> map a -> map b -> map c
fromList :: [(k,a)] -> map a toList :: map a -> [(k,a)]
instance Functor map where ...
and implementing the various variants as normal polymorphic functions? Like
insert k a = alter (const $ Just a) k singleton k a = insert k a empty
IMHO, having a few "flag" functions like
difference :: k -> Maybe a -> Maybe b -> Maybe b union :: k -> Maybe a -> Maybe a -> Maybe a intersect :: k -> Maybe a -> Maybe a -> Maybe a
that can be plugged into merge is much nicer than having to remember/lookup all the ..Maybe and whatnot'' variants.
I've never been very keen on alter myself (on efficiency grounds) and was wondering whether of not to include it. If the "altering" results in an unchanged map it would be nice to just return the unchanged map (rather than duplicate all nodes on the search path). There are other possible alternatives to alter that are more efficient in this respect. The merge function looks like an interesting idea, but it's not clear to me that it can always (or ever even :-) be implemented as efficiently as the more specialised versions. Maybe as and experiment we could implement it and if it turns out that union,intersection etc can be implemeted using it with liitle or no extra cost then we could put them in the convinience API instead (not as class methods).
5) I think the following functions are queer
lookupCont f k m = lookup k m >>= f
Or perhaps.. lookup = lookupCont Just Yes, there's probably unwanted duplication in the class methods there. My vote would be to keep lookupCont and a class method and have lookup as regular overloaded function.
pair k1 k2 = if k1 == k2 then Nothing else Just $ \a1 a2 -> insert k1 a1 (insert k2 a2 empty)
What use is pair ? I'd definitively put them outside the class if not remove them entirely.
Again, pair is one of those things that a typical Map user wouldn't use, but is definitely needed to implement an efficient trie for Lists and probably product types in general. The reason for it's existance is that singleton maps need special treatment (you want to avoid making long chains of them). Unfortunately the above definition is inefficient. You've already done most of the work requred to evaluate pair in the first equality test. This would all have to be repeated in the second insertion. Have a look at the ListGT module to see how pair is defined and used for the gory details.
6) status is an interesting idea with a meaningless name. Don't you have a name that is more to the point, that expresses the nub (pun intended) of what status does?
If you're suggesting it should be called nub, that seems confusing considering that name is already used in Data.List to mean something quite different. Perhaps you can suggest some other name?
7) I'm overwhelmed by the many foldr...Asc/Descending functions. They all can be expressed in terms of toListAsc and toListDesc and I don't think that "marginally faster" is a reason not to do so.
..or vice-versa, but without the need for list deforestation to get the same efficiency. Also I think we need to distinguish between the variants that require key reconstruction from those that don't (keys generally aren't stored in a trie).
I'd throw away the from...L functions, too.
The L versions can improve performance in some map implementations where length is needed and is already known as by product of other stuff that's going on. In such cases it seems to make sense to pass it as an argument rather than incur the cost of evaluating it again from scratch. Anway, I don't think we should get to worried about the precise details of class methods right now. I think the main concern should be getting the class hierarchy right, I'm not sure that it at present. e.g. There may be specialised representations that support lookup very efficiently but not much else. Should we have a separate class for them? Should we have separate classes for implementations that store keys (hence no key reconstruction cost) and those that must reconstruct keys? Then there's the ordering can of worms too. This is the sort of thing that really needs to be got right from the start IMHO. Regards -- Adrian Hey