Data.Map, Data.IntMap documentation

Andriy Palamarchuk

11 Aug 2007 11 Aug '07

2:42 a.m.

Improved, fixed documentation for Data.Map, Data.IntMap. Moved time complexity Big-O values to the end of function descriptions. Added examples. For the initial review I submitted only changes for Data.Map. I'll update IntMap for the final review. Deadline - August 27 (two weeks). Ticket: http://hackage.haskell.org/trac/ghc/ticket/1611 Haddock output: http://hackage.haskell.org/trac/ghc/attachment/ticket/1611/Data-Map.html?for... The module source and diff are also attached to the ticket. Andriy ____________________________________________________________________________________ Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/

Show replies by date

ross＠soi.city.ac.uk

11 Aug 11 Aug

11:57 a.m.

Quoting Andriy Palamarchuk :

...

Improved, fixed documentation for Data.Map, Data.IntMap. Moved time complexity Big-O values to the end of function descriptions. Added examples.

I rather liked having the complexity at the start of the description: it allows one to find this important information at a glance. Personally I prefer precise specifications of the behaviour to examples (and think that examples leave less room on my screen for specs of other functions), but clearly tastes vary. But I think it's important that examples be an optional addition to the main description: one should be able to determine the behaviour from the description alone. The examples are there to help, but if one has to use them guess what the function will do in some case then the description is faulty. So I'd favour setting off the examples at the end of each function description, and making them more concise by presenting them as equations. My ideal would be if Haddock could present each examples part as initially hidden but expandable with a click. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.

Andriy Palamarchuk

15 Aug 15 Aug

7:26 p.m.

Ross, thanks a lot for the feedback. Sorry I'm late with the response. --- ross@soi.city.ac.uk wrote:

...

I rather liked having the complexity at the start of the description: it allows one to find this important information at a glance.

My rationale was that the complexity information, while important, is probably one of the last things most of the people are looking for. I doubt anybody would e.g. search for all the O(log n) operations ;-) I'll keep the complexity information at the end of description unless there are more votes against this.

...

Personally I prefer precise specifications of the behavior to examples (and think that examples leave less room on my screen for specs of other functions), but clearly tastes vary.

Agree that the description spec should be precise on its own and about differences in tastes.

...

But I think it's important that examples be an optional addition to the main description: one should be able to determine the behavior from the description alone. ... The examples are there to help, but if one has to use them guess what the function will do in some case then the description is faulty.

Completely agree. In a few cases I improved descriptions too. My goal was not to provide missing information in the examples, but have complete functions specification in both - examples and text descriptions, so whatever you just said would still hold true if you swapped "examples" with "description" ;-) I consider examples to be complimentary to a description, but in general of the same importance. Each of these forms of behavior specification has its own advantages depending on a situation. The very fact of having two forms of specifications can help in understanding. This is like having a description of a programming language and its formal grammar. One can be used to debug and understand another.

...

So I'd favour setting off the examples at the end of each function description,

What do you mean? Isn't this how it is already? http://hackage.haskell.org/trac/ghc/attachment/ticket/1611/Data-Map.html?for...

...

and making them more concise by presenting them as equations.

I considered this. I did not make the examples in form of equations because in this particular case a map programmatic specification fromList [(5,"a"), (3,"b")] is more noisy and less readable than its string presentation {3:="b",5:="a"} Example - in my documentation we have this: let m = fromList [(5,"a"), (3,"b")] m {3:="b",5:="a"} map (++ "x") m {3:="bx",5:="ax"} The equation form would be (map (++ "x") (fromList [(5,"a"), (3,"b")])) == (fromList [(5,"ax"), (3,"bx")]) IMHO in the first case it is easier to understand.

...

My ideal would be if Haddock could present each examples part as initially hidden but expandable with a click.

My personal preference is to keep them together. What do you think? Andriy ____________________________________________________________________________________ Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow

Sebastian Sylvan

7:32 p.m.

On 15/08/07, Andriy Palamarchuk wrote:

...

Ross, thanks a lot for the feedback. Sorry I'm late with the response.

--- ross@soi.city.ac.uk wrote:

...
I rather liked having the complexity at the start of the description: it allows one to find this important information at a glance.

My rationale was that the complexity information, while important, is probably one of the last things most of the people are looking for. I doubt anybody would e.g. search for all the O(log n) operations ;-)

I'll keep the complexity information at the end of description unless there are more votes against this.

...
Personally I prefer precise specifications of the behavior to examples (and think that examples leave less room on my screen for specs of other functions), but clearly tastes vary.

Agree that the description spec should be precise on its own and about differences in tastes.

...
But I think it's important that examples be an optional addition to the main description: one should be able to determine the behavior from the description alone. ... The examples are there to help, but if one has to use them guess what the function will do in some case then the description is faulty.

Completely agree. In a few cases I improved descriptions too.

My goal was not to provide missing information in the examples, but have complete functions specification in both - examples and text descriptions, so whatever you just said would still hold true if you swapped "examples" with "description" ;-)

I consider examples to be complimentary to a description, but in general of the same importance. Each of these forms of behavior specification has its own advantages depending on a situation. The very fact of having two forms of specifications can help in understanding.

This is like having a description of a programming language and its formal grammar. One can be used to debug and understand another.

...
So I'd favour setting off the examples at the end of each function description,

What do you mean? Isn't this how it is already? http://hackage.haskell.org/trac/ghc/attachment/ticket/1611/Data-Map.html?for...

...
and making them more concise by presenting them as equations.

I considered this. I did not make the examples in form of equations because in this particular case a map programmatic specification

fromList [(5,"a"), (3,"b")]

is more noisy and less readable than its string presentation

{3:="b",5:="a"}

Example - in my documentation we have this:

let m = fromList [(5,"a"), (3,"b")] m {3:="b",5:="a"} map (++ "x") m {3:="bx",5:="ax"}

The equation form would be

(map (++ "x") (fromList [(5,"a"), (3,"b")])) == (fromList [(5,"ax"), (3,"bx")])

IMHO in the first case it is easier to understand.

...
My ideal would be if Haddock could present each examples part as initially hidden but expandable with a click.

My personal preference is to keep them together.

What do you think?

Examples take more real-estate than descriptions (most often) and are also less precise (most often). I'd also prefer to have it hidden if possible (e.g. some java script collapsible frame or something). Most of the time I just look at the name, read the description, and together with the type I have everything I need, so there's no reason to have to scroll through lots of examples that are only occasionally required. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Andriy Palamarchuk

16 Aug 16 Aug

6:25 p.m.

Thanks everybody for the feedback. I'm responding to all the comments at once. Please let me know if I missed something. The consensus is to have the complexity information in the beginning of descriptions. I'll return it back. --- Sebastian Sylvan wrote:

...

Examples take more real-estate than descriptions (most often) and are also less precise (most often). I'd also prefer to have it hidden if possible (e.g. some java script collapsible frame or something).

I like this suggestion, but such haddock improvements are out of scope for this change. I can, however, remove printing of the initial data to make the examples more compact, so instead of: let m = fromList [(5,'a'), (3,'b'), (7,'c')] m {3:='b',5:='a',7:='c'} ... it will be: let m = fromList [(5,'a'), (3,'b'), (7,'c')] ... The examples are too short to put them in separate sections at the end of the document, as I did with the Control.Monad examples. --- apfelmus wrote:

...

I prefer to know the laws that hold. In other words, I want an infinite amount of examples at once. For instance, the laws

lookup k' (insert k x m) = Just x if k ==> k' lookup k' (insert k x m) = lookup k' m if k /=> k'

uniquely define insert, they are everything you _can_ know about insert (from the denotational point of view). And they are even shorter than concrete examples (= their special cases). ...

Such formulas can definitely be part of the description. I'm relatively new to Haskell. While I trust that the way you described it are accurate, it takes much more effort to understand it (especially the parts with "homomorphisms", "mplus", etc) than to understand what function does. This is probably not because these topics are complex, but because I don't know terminology, not yet fluent with the concepts. Concrete examples are more useful for a person with "practical programming" background learning Haskell libraries, who just wants to start using them. --- apfelmus wrote:

...

...
I did not make the examples in form of equations because in this particular case a map programmatic specification

fromList [(5,"a"), (3,"b")]

is more noisy and less readable than its string presentation

{3:="b",5:="a"}

The joy of Haskell is that the laws themselves can be expressed in Haskell. So I prefer

fromList [(5,"a"), (3,"b")]

over

{3:="b",5:="a"}

since the latter is not a Haskell expression. Besides, the haddock neither defines it nor is there at least a show function or similar that outputs this string representation.

{3:="b",5:="a"} is actually the output of "show" function from the older GHC version I used when I started to work on the Data.Map documentation :-) In the middle of the changes I upgraded to the new version of Ubuntu, got a newer compiler, which just prints "fromList [(3,"b"),(5,"a")]". I want to collect feedback on your suggestion. Do you guys prefer this format: let m = fromList [(5,'a'), (3,'b')] insert 5 'x' m == fromList [(5,'x'), (3,'b')] insert 10 'x' m == fromList [(5,'a'), (3,'b'), (10,'x')] insert 5 'x' empty == fromList [(5,'x')] over this: let m = fromList [(5,'a'), (3,'b')] insert 5 'x' m {3:='b',5:='x'} insert 10 'x' m {3:='b',5:='a',10:='x'} insert 5 'x' empty {5:='x'} apfelmus, the first form is what you want? Can you suggest any other improvements? --- Adrian Hey wrote:

...

But I really think this discussion is somewhat moot as IMO the entire Data.Map module should be deprecated in favour of this..

http://darcs.haskell.org/packages/collections-ghc6.6/Data/Map/AVL.hs

...

and ultimately this..

http://darcs.haskell.org/packages/collections-ghc6.6/Data.Trie.General/Data/...

...

not that I'm in any way biased :-)

I hope that the new Map implementation will at least have the same level of documentation ;-) --- Adrian Hey wrote:

...

I found hardest thing about writing the clone was figuring out precisely what many of the functions did. (In many cases either the documentation was ambiguous, or it was OK but the implementation was not consistent with the docs.)

When writing these docs we found and fixed a couple of bugs and documentation inconsistencies. Andriy ____________________________________________________________________________________ Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz

Adrian Hey

10:07 p.m.

Andriy Palamarchuk wrote:

...

When writing these docs we found and fixed a couple of bugs and documentation inconsistencies.

There's an non-obvious semantic anomaly with the updateLookupWithKey function too (a bug IMO :-). But it's behaviour should be documented in any case. http://www.haskell.org/pipermail/libraries/2007-March/007304.html Regards -- Adrian Hey

apfelmus

18 Aug 18 Aug

12:22 p.m.

Andriy Palamarchuk wrote:

...

apfelmus wrote:

...
I prefer to know the laws that hold. In other words, I want an infinite amount of examples at once. For instance, the laws

lookup k' (insert k x m) = Just x if k ==> k' lookup k' (insert k x m) = lookup k' m if k /=> k' uniquely define insert, they are everything you _can_ know about insert (from the denotational point of view). And they are even shorter than concrete examples (= their special cases). ....

Such formulas can definitely be part of the description. I'm relatively new to Haskell. While I trust that the way you described it are accurate, it takes much more effort to understand it (especially the parts with "homomorphisms", "mplus", etc) than to understand what function does. This is probably not because these topics are complex, but because I don't know terminology, not yet fluent with the concepts. Concrete examples are more useful for a person with "practical programming" background learning Haskell libraries, who just wants to start using them. [...]

...

I want to collect feedback on your suggestion. Do you guys prefer this format:

let m = fromList [(5,'a'), (3,'b')] insert 5 'x' m == fromList [(5,'x'), (3,'b')] insert 10 'x' m == fromList [(5,'a'), (3,'b'), (10,'x')] insert 5 'x' empty == fromList [(5,'x')]

over this:

let m = fromList [(5,'a'), (3,'b')] insert 5 'x' m {3:='b',5:='x'} insert 10 'x' m {3:='b',5:='a',10:='x'} insert 5 'x' empty {5:='x'}

apfelmus, the first form is what you want?

Yes. Perhaps aligning the equality signs. And there's the question whether to use == or = to express equality. Strictly speaking, the latter would be wrong since the binary tree may be balanced differently (which bite us in functions like showTree ) but the former is not capable of expressing properties like fromList [(undefined, 'a')] = undefined that convey information about strictness. Personally, I think I'd use = since it feels "more equal" :) The ultra-correct way would be to only use toAscList to observe equality, i.e. to state toAscList (insert 5 'x' m) = [(3,'b'), (5,'x')] but not insert 5 'x' m = fromList [(3,'b'), (5,'x')] since we are not guaranteed that the latter trees on each side of the equation are balanced the same way. This is related to the question "now, but what _is_ a finite map from keys to values?" that I tried to hint at in my previous post. Your examples are based - whether intentionally or not - on the answer "such a map _is_ a list of (key,value) pairs." In other words, to explain what an operation like insert does with Map k a , you explain it in terms of the two functions fromList and toAscList. Here's a reformulation of your example: toAscList (insert 5 'x' (fromList [(3,'b'), (5,'a')])) = [(3,'b'), (5,'x')] We can go further and say that this is = insertList 5 'x' [(3,'b'), (5,'a')] where insertList k x [] = [(k,x)] insertList k x ((k',y):ms) = if k == k' then (k,x):ms else (k',y):insertList k x ms In other words, the description of insert is based on knowing how to insert a key-value pair into a list of key-value pairs. Removing the concrete example list, we get toAscList . insert 5 'x' . fromList = insertList 5 'x' or toAscList . insert k x = insertList k x . toAscList for all keys k and values x.

...

(especially the parts with "homomorphisms", "mplus", etc)

(Due to the equation above, mathematicians would call toAscList a "homomorphism", but you can safely ignore that word. In the last post, I tried to show that lookup can take the role of toAscList , so that a finite map may also seen as being a function k -> Maybe a instead of being a list [(k,a)] . The mplus would be an equivalent of unionList with a strange name.)

...

...
I prefer to know the laws that hold. Such formulas can definitely be part of the description.

The discussion with lists of key-value pairs and insert may seem like nitpicking since to insert a key-value pair into a list, we humans can "look at it" and "just do it". Of course, we can only do so with a concrete examples. So, your examples are indeed best to show how the function "just does it". With examples only, we can't "do it in general" though. "Doing it in general" also means to be able to formally _prove_ that a program you write is correct. Only equational laws like toAscList . insert k x = insertList k x . toAscList or lookup k' (insert k x m) = Just x if k == k' lookup k' (insert k x m) = lookup k' m if k /= k' allow you to do that. With some training and an intuition about what the functions are supposed to do, succinctly formulated laws like that are almost easier to read than examples. In fact, it's a good idea to think up specifying laws before starting to code since in purely functional languages, you can systematically obtain and implementation from the laws. The classic paper on deriving implementations from their specification is Paul Hudak. The Design of a Pretty-printing Library. http://citeseer.ist.psu.edu/hughes95design.html with a follow-up Philip Wadler. A prettier printer. http://homepages.inf.ed.ac.uk/wadler/topics/ language-design.html#prettier The man who derives all his programs from specification is Richard Bird. You may want to have a look at his recent sudoku solver Richard Bird. A program to solve Sudoku Slides: http://icfp06.cs.uchicago.edu/bird-talk.pdf where he starts with an apparently correct but hopelessly slow specification and transforms it into a blazingly fast one. His introduction to Haskell Richard Bird. Introduction to Functional Programming using Haskel (2nd edition). http://www.amazon.com/ Introduction-Functional-Programming-using-Haskell/dp/0134843460 emphasizes this style, too.

...

Can you suggest any other improvements?

Maybe some examples are a bit arbitrary, for instance for the ..WithKey functions. I mean functionality like \k a -> Just ((show k) ++ ":" ++ a) probably won't show up in a real program. I wonder whether there are examples that are useful on their own. Regards, apfelmus

Andriy Palamarchuk

27 Oct 27 Oct

3:26 a.m.

New subject: Data.Map, Data.IntMap documentation (1611)

Thanks everybody for the corrections and suggestions. Finally I got back to this work. Sorry it took this long to respond. Log comment: Improved, fixed documentation for Data.Map, Data.IntMap. Added examples. Changes since the first submission: * Moved time complexity Big-O values back to the beginning of function descriptions. * Converted examples from interpreter session-like format to more compact equation-like format. * As per http://www.haskell.org/pipermail/libraries/2007-March/007304.html added following information to the description of updateLookupWithKey: The function returns changed value, if it is updated. Returns the original key value if the map entry is deleted. Ticket: http://hackage.haskell.org/trac/ghc/ticket/1611 Haddock output: http://hackage.haskell.org/trac/ghc/attachment/ticket/1611/Data-Map.html?for... Comments, suggestions, critique? Did I miss anything? I'll update IntMap for the final review. Andriy __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Johan Tibell

8:57 a.m.

New subject: Data.Map, Data.IntMap documentation (1611)

On 10/27/07, Andriy Palamarchuk wrote:

...

Thanks everybody for the corrections and suggestions. Finally I got back to this work. Sorry it took this long to respond.

Thanks for doing this! More documentation make me happy. -- Johan

Andriy Palamarchuk

1 Nov 1 Nov

6:37 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

I started to update documentation for Data.IntMap. I found a discrepancy between the modules' updateLookupWithKey behavior right in the area brought to my attention earlier: --- Andriy Palamarchuk wrote:

...

* As per

http://www.haskell.org/pipermail/libraries/2007-March/007304.html

...

added following information to the description of updateLookupWithKey:

The function returns changed value, if it is updated. Returns the original key value if the map entry is deleted.

The test case:

...

:m Data.Map let f k x = if x == "a" then Just ((show k) ++ ":new a") else Nothing updateLookupWithKey f 5 (fromList [(5,"a"), (3,"b")])

(Just "5:new a",fromList [(3,"b"),(5,"5:new a")])

...

:m Data.IntMap updateLookupWithKey f 5 (fromList [(5,"a"), (3,"b")])

(Just "a",fromList [(3,"b"),(5,"5:new a")]) The problem here is that Data.Map.updateLookupWithKey returns the *updated* value, but Data.IntMap.updateLookupWithKey returns the value *before* update. Please agree on the behavior, fix the bug and let me know what to write in the docs. Thanks, Andriy __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Don Stewart

6:38 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

And sounds like we need some QuickCheck properties too! apa3a:

...

I started to update documentation for Data.IntMap. I found a discrepancy between the modules' updateLookupWithKey behavior right in the area brought to my attention earlier:

--- Andriy Palamarchuk wrote:

...
* As per

http://www.haskell.org/pipermail/libraries/2007-March/007304.html

...
added following information to the description of updateLookupWithKey:

The function returns changed value, if it is updated. Returns the original key value if the map entry is deleted.

The test case:

...
:m Data.Map let f k x = if x == "a" then Just ((show k) ++ ":new a") else Nothing updateLookupWithKey f 5 (fromList [(5,"a"), (3,"b")])

(Just "5:new a",fromList [(3,"b"),(5,"5:new a")])

...
:m Data.IntMap updateLookupWithKey f 5 (fromList [(5,"a"), (3,"b")])

(Just "a",fromList [(3,"b"),(5,"5:new a")])

The problem here is that Data.Map.updateLookupWithKey returns the *updated* value, but Data.IntMap.updateLookupWithKey returns the value *before* update.

Please agree on the behavior, fix the bug and let me know what to write in the docs. Thanks, Andriy

__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Adrian Hey

2 Nov 2 Nov

8:49 a.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Actually I think there are several functions in Data.Map that should be deprecated, including this one (as I did in the clone I wrote). When it comes to complex hybrid operations like this there are just to many possible variations, none of which are obviously "correct". So unless you're going to provide them all and think of sensible names for them all it's probably best to provide none of them. Instead, it's trivially easy to implement a really cheap (but limited) zipper like thing, as is done with the OMap type in the AVL clone. Using this, the function in question (or anything similar) can easily be implemented by users themselves.. There's special about AVL trees, this can be done with any binary tree. It's just a simple node indexing scheme. And it is really cheap. Even with Ints as keys an OMap based insertion only takes about 10% longer than a regular insertion, and you get the added advantage that if the the tree is unmodified you really do get exactly the same tree (you don't pointlessly burn a lot of heap duplicating all the nodes on the search path). Regards -- Adrian Hey Don Stewart wrote:

...

And sounds like we need some QuickCheck properties too!

apa3a:

...
I started to update documentation for Data.IntMap. I found a discrepancy between the modules' updateLookupWithKey behavior right in the area brought to my attention earlier:

--- Andriy Palamarchuk wrote:

...
* As per

http://www.haskell.org/pipermail/libraries/2007-March/007304.html

...
added following information to the description of updateLookupWithKey:

The function returns changed value, if it is updated. Returns the original key value if the map entry is deleted.

The test case:

...
:m Data.Map let f k x = if x == "a" then Just ((show k) ++ ":new a") else Nothing updateLookupWithKey f 5 (fromList [(5,"a"), (3,"b")])

(Just "5:new a",fromList [(3,"b"),(5,"5:new a")])

...
:m Data.IntMap updateLookupWithKey f 5 (fromList [(5,"a"), (3,"b")])

(Just "a",fromList [(3,"b"),(5,"5:new a")])

The problem here is that Data.Map.updateLookupWithKey returns the *updated* value, but Data.IntMap.updateLookupWithKey returns the value *before* update.

Please agree on the behavior, fix the bug and let me know what to write in the docs. Thanks, Andriy

__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Adrian Hey

9:06 a.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Adrian Hey wrote:

...

There's special about AVL trees, Oops, insert the word "nothing" in the obvious place.

Andriy Palamarchuk

12:36 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Don, I'll leave QuickCheck out of scope for the documentation change. Adrian, question whether to deprecate updateLookupWithKey is worth discussing, but we need to fix the function anyway. All my documentation changes are ready. I'm holding the patch only for this problem. What if I leave the function description for IntMap the same as for Map and submit a bug report for the problem? Then the person resolving the bug will either fix the docs or the documentation. Any thoughts? Andriy __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Christian Maeder

1:47 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

The original documentation only said "lookup and update". I think it makes more sense to return the old value (i.e. the lookup result in the original input map.), because the new value can be recomputed using the update function. The opposite (recomputation of the old value given the new one) is not possible. But I agree too, that deprecating this function is the right thing (in the future). Cheers Christian Andriy Palamarchuk wrote:

...

Don, I'll leave QuickCheck out of scope for the documentation change.

Adrian, question whether to deprecate updateLookupWithKey is worth discussing, but we need to fix the function anyway.

All my documentation changes are ready. I'm holding the patch only for this problem.

What if I leave the function description for IntMap the same as for Map and submit a bug report for the problem? Then the person resolving the bug will either fix the docs or the documentation.

Any thoughts? Andriy

__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Adrian Hey

1:59 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Andriy Palamarchuk wrote:

...

Adrian, question whether to deprecate updateLookupWithKey is worth discussing, but we need to fix the function anyway.

I would say just document the code as it is, warts and all. Don't change the functions because maybe someone is using them (though I suspect this is not the case). But also add a DEPRECATE pragma to discourage further use. Maybe do this for both IntMap and Map, because even if they were consistent, I don't think they're very useful. An API that clearly separates the process of searching for the key from whatever you want to do when it's found (or not) is so much more flexible and easy to understand IMHO. BTW, the way I did this should also work for IntMap tries too AFAICT, but I'm not sure it would offer any performance advantage over a repeated lookup in this case. So maybe keep them for IntMap? My 2p.. Regards -- Adrian Hey

Iavor Diatchki

4:43 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Hi, I have always wondered about the usefulness of the "withKey" functions because the key is one of the arguments that the programmer is passing anyway, so they already know its value. The only situation that I can think of where that might not be the case is if the equality relation on the keys is actually an equivalence relation, so the key in the map might be different from the key that was passed by the programmer even though they are "equal", and the distinction between the two keys is important. I would be a bit suspect of code that relies on that though... Does anybody have other uses of the "withKey" functions? -Iavor PS: as for the bug, I would just fix it (i.e., send a patch) On 11/2/07, Adrian Hey wrote:

...

Andriy Palamarchuk wrote:

...
Adrian, question whether to deprecate updateLookupWithKey is worth discussing, but we need to fix the function anyway.

I would say just document the code as it is, warts and all. Don't change the functions because maybe someone is using them (though I suspect this is not the case).

But also add a DEPRECATE pragma to discourage further use.

Maybe do this for both IntMap and Map, because even if they were consistent, I don't think they're very useful. An API that clearly separates the process of searching for the key from whatever you want to do when it's found (or not) is so much more flexible and easy to understand IMHO.

BTW, the way I did this should also work for IntMap tries too AFAICT, but I'm not sure it would offer any performance advantage over a repeated lookup in this case. So maybe keep them for IntMap?

My 2p..

Regards -- Adrian Hey

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Christian Maeder

5:40 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Iavor Diatchki wrote:

...

Hi, I have always wondered about the usefulness of the "withKey" functions because the key is one of the arguments that the programmer is passing anyway, so they already know its value. [...] Does anybody have other uses of the "withKey" functions? -Iavor

The WithKey-variants are used where a new value is computed from an old one and when this function also needs to consider the key. At least filterWithKey and foldWithKey are very useful.

...

PS: as for the bug, I would just fix it (i.e., send a patch)

Which implementation do you consider buggy? IntMap or Map? (or may just document it as inconsistent) Cheers Christian

Andriy Palamarchuk

7:21 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

--- Christian Maeder wrote:

...

Which implementation do you consider buggy? IntMap or Map?

(or may just document it as inconsistent)

Map is used much wider, so a change to its behavior would be more disruptive. We should take into account what other Map implementations do. See, e.g. this message mentioned earlier when discussing the documentation change: http://www.haskell.org/pipermail/libraries/2007-March/007304.html Andriy __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Iavor Diatchki

8:38 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Hello, On 11/2/07, Christian Maeder wrote:

...

Iavor Diatchki wrote:

...
Hi, I have always wondered about the usefulness of the "withKey" functions because the key is one of the arguments that the programmer is passing anyway, so they already know its value. [...] Does anybody have other uses of the "withKey" functions? -Iavor

The WithKey-variants are used where a new value is computed from an old one and when this function also needs to consider the key.

At least filterWithKey and foldWithKey are very useful.

I was referring to the insert,update,adjust functions where the key is already one of the arguments to the function. For the various traversals having a WithKey version makes sense.

...

...
PS: as for the bug, I would just fix it (i.e., send a patch)

Which implementation do you consider buggy? IntMap or Map?

(or may just document it as inconsistent)

I think that the two should have the same behaviour. To me, the implementation that returns the _old_ value seems more useful. -Iavor

Andriy Palamarchuk

22 Nov 22 Nov

3:34 a.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Thanks everybody for the comment. Because there was no consensus about how to resolve the problem I ended up leaving Data.IntMap.updateLookupWithKey implementation as is and adding a documentation that its behavior differs from Data.Map.updateLookupWithKey. Also, this is the least disruptive alternative. Andriy ____________________________________________________________________________________ Get easy, one-click access to your favorites. Make Yahoo! your homepage. http://www.yahoo.com/r/hs

Iavor Diatchki

26 Nov 26 Nov

2:09 a.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Hello, On Nov 21, 2007 7:34 PM, Andriy Palamarchuk wrote:

...

Thanks everybody for the comment.

Because there was no consensus about how to resolve the problem I ended up leaving Data.IntMap.updateLookupWithKey implementation as is and adding a documentation that its behavior differs from Data.Map.updateLookupWithKey. Also, this is the least disruptive alternative.

Andriy

I think that we should fix this: Data.IntMap should be a more efficient implementation of Data.Map, for the special case when the keys are Ints. The methods should not behave differently. No one seemed to disagree with the behavior that Christian (and also I) suggested, namely, to return the old value, so why don't we jsut make both implementations do that? -Iavor

Christian Maeder

2:39 p.m.

New subject: updateLookupWithKey bug? Was: Data.Map, Data.IntMap documentation (1611)

Iavor Diatchki wrote:

...

On Nov 21, 2007 7:34 PM, Andriy Palamarchuk wrote:

...
Because there was no consensus about how to resolve the problem I ended up leaving Data.IntMap.updateLookupWithKey implementation as is and adding a documentation that its behavior differs from Data.Map.updateLookupWithKey. Also, this is the least disruptive alternative.

I think that we should fix this: Data.IntMap should be a more efficient implementation of Data.Map, for the special case when the keys are Ints. The methods should not behave differently. No one seemed to disagree with the behavior that Christian (and also I) suggested, namely, to return the old value, so why don't we jsut make both implementations do that?

It seems, nobody wants to take the responsibility for changing the implementation (or overtake maintainership for any collection types). (Another suggestion was deprecation, but correcting it as we suggested would be more in the spirit of Daan, who initially designed DData.) Just a further note to: "Data.IntMap should be a more efficient implementation of Data.Map" This is currently not the case (at least) for the "size" function. Christian

Andriy Palamarchuk

22 Nov 22 Nov

3:30 a.m.

New subject: Data.Map, Data.IntMap documentation (1611)

Thanks everybody for the corrections and suggestions. Sorry it took this long to respond. This is the final submission. Please commit the patch. Log comment: Improved, fixed documentation for Data.Map, Data.IntMap. Added examples. Changes since the previous submission: * Moved time complexity Big-O values back to the beginning of function descriptions. * Converted examples from interpreter session-like format to more compact equation-like format. * As per http://www.haskell.org/pipermail/libraries/2007-March/007304.html added following information to the description of updateLookupWithKey: The function returns changed value, if it is updated. Returns the original key value if the map entry is deleted. * Made corresponding changes for Data.IntMap.updateLookupWithKey. Mentioned that it differs from the Data.Map version. Ticket: http://hackage.haskell.org/trac/ghc/ticket/1611 Andriy ____________________________________________________________________________________ Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ

Andriy Palamarchuk

1 Mar 1 Mar

3:36 a.m.

New subject: Data.Map, Data.IntMap documentation (1611)

Could somebody please commit the patch attached to this ticket? Sorry about a long delay before I finalized the patch. Thanks, Andriy --- Andriy Palamarchuk wrote:

...

Thanks everybody for the corrections and suggestions. Finally I got back to this work. Sorry it took this long to respond.

Log comment: Improved, fixed documentation for Data.Map, Data.IntMap. Added examples.

Changes since the first submission: * Moved time complexity Big-O values back to the beginning of function descriptions. * Converted examples from interpreter session-like format to more compact equation-like format. * As per

http://www.haskell.org/pipermail/libraries/2007-March/007304.html

...

added following information to the description of updateLookupWithKey:

The function returns changed value, if it is updated. Returns the original key value if the map entry is deleted.

Ticket: http://hackage.haskell.org/trac/ghc/ticket/1611

Haddock output:

http://hackage.haskell.org/trac/ghc/attachment/ticket/1611/Data-Map.html?for...

...

Comments, suggestions, critique? Did I miss anything? I'll update IntMap for the final review. Andriy

__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping

Ian Lynagh

19 Mar 19 Mar

4:45 p.m.

New subject: Data.Map, Data.IntMap documentation (1611)

On Fri, Feb 29, 2008 at 07:36:36PM -0800, Andriy Palamarchuk wrote:

...

Could somebody please commit the patch attached to this ticket?

Applied, thanks! Ian

Isaac Dupree

15 Aug 15 Aug

8:52 p.m.

Andriy Palamarchuk wrote:

...

Ross, thanks a lot for the feedback. Sorry I'm late with the response.

--- ross@soi.city.ac.uk wrote:

...
I rather liked having the complexity at the start of the description: it allows one to find this important information at a glance.

My rationale was that the complexity information, while important, is probably one of the last things most of the people are looking for. I doubt anybody would e.g. search for all the O(log n) operations ;-)

I'll keep the complexity information at the end of description unless there are more votes against this.

On the contrary, I have gone looking through entries solely to see their complexity (since I already know about what the functions _do_), especially when comparing two different collections or just looking. They are very short and I like them being the most introductory thing, myself. If you know anything about what complexities the data structure gives for various operations, it can actually suggest quite a bit about what the operation does! (e.g., an O(log(n)) operation CANNOT be a map...) Maybe it's because I have general data structures experience already from other languages, but... complexity is very important to me, it is often the first thing I want to see (it comes after the operation's name anyway!), O(n) lookup is almost as bad as no lookup, O(n^2) behavior, say, with (++)s is generally fatal, worthy of being called a bug in my program. (Practical speed observations like "IntMaps are fast even with not so large sizes" are very important too) Isaac

apfelmus

16 Aug 16 Aug

8:17 a.m.

Andriy Palamarchuk wrote:

...

...
I rather liked having the complexity at the start of the description: it allows one to find this important information at a glance.

My rationale was that the complexity information, while important, is probably one of the last things most of the people are looking for. I doubt anybody would e.g. search for all the O(log n) operations ;-)

I'll keep the complexity information at the end of description unless there are more votes against this.

I too think that the complexity should be at the beginning since that's what I'm interested in when choosing a finite map implementation.

...

...
So I'd favour setting off the examples at the end of each function description, and making them more concise by presenting them as equations.

Actually, while you did good work added the large amount of examples, I don't like examples at all, I prefer to know the laws that hold. In other words, I want an infinite amount of examples at once. For instance, the laws lookup k' (insert k x m) = Just x if k == k' lookup k' (insert k x m) = lookup k' m if k /= k' uniquely define insert, they are everything you _can_ know about insert (from the denotational point of view). And they are even shorter than concrete examples (= their special cases). In fact, a large group of operation on Data.Map k a can be specified with corresponding operations on k -> Maybe a . Put differently, the introductory sentence really is: "finite maps are an efficient implementation of maps from keys k to values a, i.e. of k -> Maybe a ". An example: lookup k (union m m') = lookup k m' `mplus` lookup k m In a sense, `mplus` is the union for the finite map type Maybe a . In other words, the specifications says that lookup is a homomorphism of maps, or rather that the type Data.Map k a is a map thanks to that homomorphism. Another example: lookup k (map f m) = f `fmap` (lookup k m) Again, fmap is the map for Maybe a . Note that not every operation can be expressed in terms of lookup . For example, fold f z = foldr f z . elems cannot be expressed with lookup. As an example for fold, the textual description gives elems = fold (:) [] With this, I think that the textual description for fold is superior to the description by examples, so that I even prefer them not to be present at all. It's not necessary to specify everything with external types like lists or k -> May a , combined functions can be expressed with the components, like updateLookupWithKey f k m = (lookup k m, updateWithKey f k m) This is much clearer than examples and doesn't need interpretation like the description "Lookup and update". Also, the link of the more general functions their less general cousins is very useful description insert k x m = insertWith (\new old -> new) k x m

...

I did not make the examples in form of equations because in this particular case a map programmatic specification

fromList [(5,"a"), (3,"b")]

is more noisy and less readable than its string presentation

{3:="b",5:="a"}

The joy of Haskell is that the laws themselves can be expressed in Haskell. So I prefer fromList [(5,"a"), (3,"b")] over {3:="b",5:="a"} since the latter is not a Haskell expression. Besides, the haddock neither defines it nor is there at least a show function or similar that outputs this string representation. Regards, apfelmus

Adrian Hey

10:52 a.m.

apfelmus wrote:

...

An example:

lookup k (union m m') = lookup k m' `mplus` lookup k m

Shouldn't that be.. lookup k (union m m') = lookup k m `mplus` lookup k m' But I get confused by this, which is why I think it might be a good idea to deprecate union for Data.Map altogether. A related problem.. http://hackage.haskell.org/trac/ghc/ticket/1460 http://www.haskell.org/pipermail/libraries/2007-May/007491.html But I really think this discussion is somewhat moot as IMO the entire Data.Map module should be deprecated in favour of this.. http://darcs.haskell.org/packages/collections-ghc6.6/Data/Map/AVL.hs and ultimately this.. http://darcs.haskell.org/packages/collections-ghc6.6/Data.Trie.General/Data/... not that I'm in any way biased :-) But if we must stick with Data.Map for some reason then what Andriy is doing is worth while, the documentation does need improving. I found hardest thing about writing the clone was figuring out precisely what many of the functions did. (In many cases either the documentation was ambiguous, or it was OK but the implementation was not consistent with the docs.) Regards -- Adrian Hey

Johannes Waldmann

11:12 a.m.

It feels strange reading "Data.Map should be replaced" because it had replaced Data.FiniteMap only recently. Are we going to play this game again soon? This is definitely the case for an interface (erm, type class) that will allow the programmer to choose between implementations. (PS: two step plan to dramatically increase Haskell's popularity: replace "class" with "interface", "data" with "class" :-) -- -- Johannes Waldmann -- Tel/Fax (0341) 3076 6479/80 -- ---- http://www.imn.htwk-leipzig.de/~waldmann/ -------

Adrian Hey

10:05 p.m.

Johannes Waldmann wrote:

...

It feels strange reading "Data.Map should be replaced" because it had replaced Data.FiniteMap only recently.

Unfortunately AFAIK both the old Data.FiniteMap and the new Data.Map are based on the same underlying tree structure, which seems to do a not particularly good job of balancing rather expensively. Also, the efficient hedge algorithm seems to be a problem (not at all efficient in reality).

...

Are we going to play this game again soon?

Well I don't see any problem as the clone is intended to be just that. For most users it should be a case of just changing imports to use it. There are some API changes though, lots of stuff added and other stuff which probably should never have been in there in the first place deprecated or removed (like the indexing operations).

...

This is definitely the case for an interface (erm, type class) that will allow the programmer to choose between implementations.

Not sure what it is you're talking about that's "definitely the case". Ideally we should have just one class based API for multiple implementations of maps, which is what the collections package Map class aims to be. But there are a few contenders here. There are also the Edison classes. What is currently called the GT class in the collections package is another (it has some weird methods in the API that typical users wouldn't need but which are needed to implement other instances of GT). Regards -- Adrian Hey

apfelmus

11:26 a.m.

Adrian Hey schrieb:

...

apfelmus wrote:

...
An example:

lookup k (union m m') = lookup k m' `mplus` lookup k m

Shouldn't that be..

lookup k (union m m') = lookup k m `mplus` lookup k m'

Oh indeed, and I thought it would be right-biased :)

...

IMO the entire Data.Map module should be deprecated in favour of this..

http://darcs.haskell.org/packages/collections-ghc6.6/Data/Map/AVL.hs

and ultimately this..

http://darcs.haskell.org/packages/collections-ghc6.6/Data.Trie.General/Data/...

Sounds good. But I think that the GT class is not general enough. I'd wait for associated type synonyms since we can then write class Map map k where type Elem map k empty :: map insert :: k -> Elem map k -> map -> map lookup :: k -> map -> Maybe (Elem map k) ... The Ord-constraint is too limiting for tries. Also, the map type does not necessarily fix the key type, we can use one map with different key types. For instance, we have the trivial instance Map map () where type Elem map () = map for every map type. And we can compose maps instance Map (Data.Map k1 (Data.Map k2 a)) k1 where type Elem .. .. = Data.Map k2 a instance Map (Data.Map k1 (Data.Map k2 a)) (k1,k2) where type Elem .. .. = a

...

But if we must stick with Data.Map for some reason then what Andriy is doing is worth while, the documentation does need improving. I found hardest thing about writing the clone was figuring out precisely what many of the functions did. (In many cases either the documentation was ambiguous, or it was OK but the implementation was not consistent with the docs.)

Yes, the documentation is in dire need of sharpening and I advocate the Haskell way, i.e. equational laws for that. Regards, apfelmus

Adrian Hey

10:37 p.m.

apfelmus wrote:

...

The Ord-constraint is too limiting for tries.

Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)? It's extremely useful to be able to give meaningful orderings when converting between tries and lists. In the case of constructing tries from ordered association lists there's a significant performance advantage too.

...

Also, the map type does not necessarily fix the key type,

For instances of GT it does. The point is that eventually the Trie type (and corresponding GT instance) will be designed optimally and specifically for one particular key type (using DrIFT, Derive, template Haskell or something..)

...

we can use one map with different key types.

Not with GT. If (say) Data.Map was an instance then we would have something like.. instance Ord k => GT (Map k) k where .. I.E. The map type constructor in question is (Map k), not Map. (Map k does fix the key type as k) Regards -- Adrian Hey

Benjamin Franksen

11:39 p.m.

Adrian Hey wrote:

...

apfelmus wrote:

...
The Ord-constraint is too limiting for tries.

Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

Data.Typeable.TypeRep Cheers Ben

Adrian Hey

17 Aug 17 Aug

6:57 a.m.

Benjamin Franksen wrote:

...

Adrian Hey wrote:

...
apfelmus wrote:

...
The Ord-constraint is too limiting for tries. Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

Data.Typeable.TypeRep

Interesting problem, but I don't see any reason why this could not be an instance of Ord. Because of the weird Key hack that's there I guess you'd have to hand write this instance, not derive it. Then for GT you'd have to either hand write a matching GT instance or fall back on the less efficient but general purpose OrdGT.. http://darcs.haskell.org/packages/collections-ghc6.6/Data.Trie.General/Data/... Regards -- Adrian Hey

Benjamin Franksen

18 Aug 18 Aug

1:33 a.m.

Adrian Hey wrote:

...

Benjamin Franksen wrote:

...
Adrian Hey wrote:

...
Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

Data.Typeable.TypeRep

Interesting problem, but I don't see any reason why this could not be an instance of Ord.

I asked this question once and got as answer (from one of the Simons, IIRC) that exposing the obvious Ord instance would break referential transparency: there is no guarantee that the integer keys and therefore the order among TypeReps is the same for different runs, even of the same program. This (counter-)example may be irrelevant for the discussion at hand or not. Anyway, you asked for one... Cheers Ben

Stefan O'Rear

1:59 a.m.

On Sat, Aug 18, 2007 at 03:33:19AM +0200, Benjamin Franksen wrote:

...

Adrian Hey wrote:

...
Benjamin Franksen wrote:

...
Adrian Hey wrote:

...
Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

Data.Typeable.TypeRep

Interesting problem, but I don't see any reason why this could not be an instance of Ord.

I asked this question once and got as answer (from one of the Simons, IIRC) that exposing the obvious Ord instance would break referential transparency: there is no guarantee that the integer keys and therefore the order among TypeReps is the same for different runs, even of the same program.

This (counter-)example may be irrelevant for the discussion at hand or not. Anyway, you asked for one...

There's nothing stoping you from writing instance Ord TypeRep where compare = comparing show STRef could also be cheaply made Ord, but at the cost of a bit more work; the idea is to notice that GHC's compacting garbage collector never reorders "abstract heap addreses", a concept I just made up which starts at 0 and increases as new blocks are added to the heap; so one must create an inverse map from physical blocks (which are aligned, so finding metadata is just a bitmask) to abstract block numbers, and then use that information to implement a compareAddress# primop suitable for {ST,IO}Reff, TVar, and {ST,IO}{U,}Array (in the latter case, you would do address comparison for pinned arrays and arbitrarily declare all pinned arrays larger than all unpinned arrays). Stefan

Twan van Laarhoven

3:04 a.m.

Stefan O'Rear wrote:

...

...
...
...
Data.Typeable.TypeRep

Interesting problem, but I don't see any reason why this could not be an instance of Ord.

...

There's nothing stoping you from writing

instance Ord TypeRep where compare = comparing show

STRef could also be cheaply made Ord, ...

What we really need for this is a 'unsafe' PrimOrd class, sugesting that the comparison is not intended for human consumption.

...

class Eq a => PrimOrd a where primcompare :: a -> a -> Ordering class PrimOrd a => Ord a where -- needs something like class aliasses to be useble

Now the Data.Map functions can be based on PrimOrd, because we don't actually care about the ordering. Except for to/fromAscList, which should check the ordering is actually an Ord. Twan

Isaac Dupree

11:47 a.m.

Twan van Laarhoven wrote:

...

Now the Data.Map functions can be based on PrimOrd, because we don't actually care about the ordering. Except for to/fromAscList, which should check the ordering is actually an Ord.

and toList and fold, even for associative-commutative operations, because it can't know. Also it might be fair to lift the nondeterminism to IO, e.g. you could define your program's semantics to output these elements in an arbitrary unpredictable order. Maybe there should be a separate FloatOrd class if you want the fourth possibility of "false due to NaN" in your comparisons of floating-point numbers, so that Float/Double can be made proper members of PrimOrd. Isaac

Isaac Dupree

19 Aug 19 Aug

10:44 a.m.

I wrote:

...

Maybe there should be a separate FloatOrd class if you want the fourth possibility of "false due to NaN" in your comparisons of floating-point numbers, so that Float/Double can be made proper members of PrimOrd.

Then there are non-total orderings such as subset relations or many lattices... which often get demoted to using `lt` and `gt` symbols. Of course, Floats are not well-behaved even in Eq. Prelude> let x = 0/0 :: Double in x == x False And there is no simple operation based on their (<), (<=)... that fulfills even the requirements for a partial order... http://en.wikipedia.org/wiki/Partially_ordered_set A newtype of them that becomes _|_ upon isNaN would make a true total order AFAIK (the infinities -at least in IEEE arithmetic- are well-behaved with respect to ordering and equalling themselves). Isaac

apfelmus

17 Aug 17 Aug

8:06 a.m.

What I want to say is that a very general Map class can itself be useful as abstraction, i.e. the programmer can build his own map types when they capture the essence of his problem. David House and I once stumbled upon this. The task was to write a polymorphic delete/insert for a web-forum. I mean, you can delete individual posts or whole forums type Site = Data.Map ForumID Forum type Forum = Data.Map ForumID Post type Post = String but you'd have to write different delete functions for that. The insight was that the Site is a map of forums instance Map Site ForumID where type Elem Site ForumID = Forum while also being a map of Posts instance Map Site (ForumID, PostID) where type Elem Site (ForumID, PostID) = Post So, insert and delete work for both the forums and the posts. The path from the root of the hierarchy to the object in question says whether it's a forum or a post to operate on. Adrian Hey wrote:

...

apfelmus wrote:

...
The Ord-constraint is too limiting for tries.

Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

It's extremely useful to be able to give meaningful orderings when converting between tries and lists. In the case of constructing tries from ordered association lists there's a significant performance advantage too.

Well of course, every algebraic type can be made an instance of Ord and Eq. But neither is necessary for implementing Ralf Hinze. Generalizing generalized tries. http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz Those instances that want an Ord requirement can impose it but those that don't want it won't need to impose it.

...

...
Also, the map type does not necessarily fix the key type,

For instances of GT it does. The point is that eventually the Trie type (and corresponding GT instance) will be designed optimally and specifically for one particular key type (using DrIFT, Derive, template Haskell or something..)

...
we can use one map with different key types.

Not with GT. If (say) Data.Map was an instance then we would have something like..

instance Ord k => GT (Map k) k where ..

I.E. The map type constructor in question is (Map k), not Map. (Map k does fix the key type as k)

I don't see why a map should work for a single key type only, except that maps as type constructors support only one key type of course, since the element type is fixed then. Different key types may yield different element types. Regards, apfelmus

Adrian Hey

18 Aug 18 Aug

6:23 p.m.

apfelmus wrote:

...

Well of course, every algebraic type can be made an instance of Ord and Eq. But neither is necessary for implementing

Ralf Hinze. Generalizing generalized tries. http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz

Indeed, none of the current or anticipated future GT implementations ever actually compare two Keys, despite the Ord constraint (well except OrdGT itself). But ListGT does (and others possibly will) test for equality, so will need an Eq constraint. This is done for reasons of efficiency. Using a ListGT implementation like the one given in the paper you cite would remove the need for this, but would give poor performance and excessive heap burn rate for some common operations, so I'm not going to go that way.

...

Those instances that want an Ord requirement can impose it but those that don't want it won't need to impose it.

But I don't see any harm in it even for people who don't care about ordering, and it's pretty useful for those (the majority I believe) who do care. I also don't see the problem. To produce a trie type for a given key type you have to know something about the structure of the key, certainly enough to be able to test keys for equality (in principle, even if the eventual implemetation never actually does this). Same applies to ordering, except for a few vary rare "hack" cases like the examples that have been given (which are probably cureable, even if they haven't actually been cured at present). If say instead of assocsAscending we just had assocs (which didn't provide any guarantee about ordering), then anyone who wanted the list sorted would have to do a separate sort. Also in the absence of some other Ord respecting trie type they'd have to do this sort the slow way (by O(n.log n) comparisons). One thing possibly might be to move the Ord constraint from the class head and put in relevant methods, or maybe have a separate GTO class.. class (Ord k, GT map k) => GTO map k where.. I'll give that some thought, but I rather like knowing for sure that if I have (GT map k) then I also have (Ord k) and hence (Eq k) and I don't like complexifying the class hierarchy for no good reason (I'm not persuaded that this Ord issue really is an issue at all)

...

I don't see why a map should work for a single key type only, except that maps as type constructors support only one key type of course, since the element type is fixed then. Different key types may yield different element types.

I'm not sure I understand what you mean by single key type only. AFAICS you can only implement an *efficient* trie if it has been designed specifically for the key type in question, so of course it should come as no surprise that it will only work for the key type in question. But this is not a problem if GT derivation for any type you define is automated. But the keys can be polymorphic and you can still make GT instances provided you have GT constraints on the type variables. The ListGT instance will work for many key types for example, but they all have to be lists of something. Regards -- Adrian Hey

apfelmus

22 Aug 22 Aug

7:28 a.m.

Adrian Hey wrote:

...

...
I don't see why a map should work for a single key type only, except that maps as type constructors support only one key type of course, since the element type is fixed then. Different key types may yield different element types.

I'm not sure I understand what you mean by single key type only. AFAICS you can only implement an *efficient* trie if it has been designed specifically for the key type in question, so of course it should come as no surprise that it will only work for the key type in question.

What I want to say is that the functional dependency map -> k is too restrictive. A single map type can serve as map for different types of keys and values. So, the type Foo k1 k2 a = Data.Map k1 (Data.Map k2 a) can be seen as a map from keys (k1,k2) to values a instance Map (Foo k1 k2 a) (k1,k2) where type Elem .. .. = a but also as a map from keys k1 to values (Data.Map k2 a) instance Map (Data.Map k1 (Data.Map k2 a)) k1 where type Elem .. .. = Data.Map k2 a Regards, apfelmus

Arie Peterson

17 Aug 17 Aug

8:27 a.m.

Adrian Hey wrote:

...

...
The Ord-constraint is too limiting for tries.

Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

Complex numbers, and trees, and probably many more. Some types do not have a natural total ordering. Sure, you can pick some arbitrary one, but that can be confusing, and it decreases the ability of the type system to catch semantic errors. I would much prefer to have a separate class for "arbitrary total orderings", with an automatic Ord-based instance. Kind regards, Arie

Adrian Hey

18 Aug 18 Aug

10:24 a.m.

Arie Peterson wrote:

...

Adrian Hey wrote:

...
...
The Ord-constraint is too limiting for tries. Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

Complex numbers, and trees, and probably many more. Some types do not have a natural total ordering. Sure, you can pick some arbitrary one, but that can be confusing, and it decreases the ability of the type system to catch semantic errors.

True, but in practice it's hard to think of any Ordering that isn't completely arbitrary, other than reals perhaps. It isn't obvious to me that 'A' is less than 'B', or (1,2) is less than (2,1). But I don't see this as a big problem. It seems to me that there's no getting away from arbitraryness. Say the GT class abandoned the Ord constraint and just provided.. assocs :: map a -> [(k,a)] which returned the list of associations in some *arbitrary* (unspecified but still deterministic) order of keys. Is that any better? I would prefer to have some universal default ordering for any key type that was used consistently by various APIs, even if the ordering is arbitrary. Regards -- Adrian Hey

apfelmus

12:36 p.m.

Adrian Hey wrote:

...

It seems to me that there's no getting away from arbitraryness. Say the GT class abandoned the Ord constraint and just provided..

assocs :: map a -> [(k,a)]

which returned the list of associations in some *arbitrary* (unspecified but still deterministic) order of keys. Is that any better?

Yes, given that you don't mind the arbitrary order :) Like for printing the list. The point is just that the general finite map class shouldn't depend on an ordering and hence not provide toAscList :: Ord k => map k a -> [(k,a)] I'd use an extra class with Map as superclass for that. Regards, apfelmus

Arie Peterson

1:22 p.m.

Adrian Hey wrote:

...

True, but in practice it's hard to think of any Ordering that isn't completely arbitrary, other than reals perhaps. It isn't obvious to me that 'A' is less than 'B', or (1,2) is less than (2,1). But I don't see this as a big problem.

Well, we could debate the naturality of these orderings, but that is beside my point, because these instances are in the standard libraries. Some data types (both in libraries and "user defined") *deliberately* do not have Ord instances, and currently Data.Map does not play nice with this choice. For example, when I write 'someTree < otherTree' the compiler shouts at me and helps me realise that I actually meant 'height someTree < height otherTree'. This mechanism fails once I need trees to be keys in a map. My current solution (newtypes) is far from ideal.

...

It seems to me that there's no getting away from arbitraryness. Say the GT class abandoned the Ord constraint and just provided..

assocs :: map a -> [(k,a)]

which returned the list of associations in some *arbitrary* (unspecified but still deterministic) order of keys. Is that any better? I would prefer to have some universal default ordering for any key type that was used consistently by various APIs, even if the ordering is arbitrary.

I'm all for arbitrary orderings, as long as they don't "pollute" the Ord class :-). Greetings, Arie

Adrian Hey

5:39 p.m.

Arie Peterson wrote:

...

For example, when I write 'someTree < otherTree' the compiler shouts at me and helps me realise that I actually meant 'height someTree < height otherTree'. This mechanism fails once I need trees to be keys in a map. My current solution (newtypes) is far from ideal.

This seems a bit anal to me. If you really are worried about such things it's probably best to avoid using lists too, or you might end up writing (xs < ys) instead of (length xs < length ys) Regards -- Adrian Hey

Aaron Denney

19 Aug 19 Aug

3:40 a.m.

On 2007-08-18, Adrian Hey wrote:

...

Arie Peterson wrote:

...
Adrian Hey wrote:

...
...
The Ord-constraint is too limiting for tries. Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

Complex numbers, and trees, and probably many more. Some types do not have a natural total ordering. Sure, you can pick some arbitrary one, but that can be confusing, and it decreases the ability of the type system to catch semantic errors.

True, but in practice it's hard to think of any Ordering that isn't completely arbitrary, other than reals perhaps.

No? These are usually somewhat arbitrary, but not completely arbitrary.

...

It isn't obvious to me that 'A' is less than 'B', or (1,2) is less than (2,1).

Both depend on convention. For pairs, either convention make sense, so we pick one arbitrarily. For something like complex numbers, there are many things one could do, but none of them make sense, essentially because of the /topology/. There are degrees of arbitrariness. We can work with small amounts just fine. Large amounts are more troublesome. -- Aaron Denney -><-

Isaac Dupree

17 Aug 17 Aug

7:58 p.m.

Adrian Hey wrote:

...

apfelmus wrote:

...
The Ord-constraint is too limiting for tries.

Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

STRef and similar, I think are in Eq but would need performance penalties to be in Ord, since they essentially only contain a pointer to their contents, and garbage collection can move and re-order them and their contents. Isaac

Adrian Hey

18 Aug 18 Aug

10:39 a.m.

Isaac Dupree wrote:

...

Adrian Hey wrote:

...
apfelmus wrote:

...
The Ord-constraint is too limiting for tries.

Well it isn't going to disappear while I'm in charge of GT class :-) Why do you object to it? Ultimately we must be able to test keys for equality at least. Is there a type that is an instance of Eq, but not Ord (or could not reasonably be made an instance of Ord)?

STRef and similar, I think are in Eq but would need performance penalties to be in Ord, since they essentially only contain a pointer to their contents, and garbage collection can move and re-order them and their contents.

I guess anyone who wants an STRefMap will just have to suffer the performance penalty of O(n) lookup and make STRefMap an instance of some other map class (there are plenty to choose from :-) Regards -- Adrian Hey

6324

Age (days ago)

6545

Last active (days ago)

List overview

Download

50 comments

17 participants

participants (17)

Aaron Denney
Adrian Hey
Andriy Palamarchuk
apfelmus
Arie Peterson
Benjamin Franksen
Christian Maeder
Don Stewart
Ian Lynagh
Iavor Diatchki
Isaac Dupree
Johan Tibell
Johannes Waldmann
ross＠soi.city.ac.uk
Sebastian Sylvan
Stefan O'Rear
Twan van Laarhoven