Edison 1.2rc2

newer
Annoucing alternative to Text.Regex

Robert Dockins

3 Mar 2006 3 Mar '06

3:58 p.m.

Greetings! The second release candidate for Edison 1.2 is now ready for comments. The notable changes from RC1 are: * add strict variants of all folds and reduces * reverse argument orders to 'rcons' and 'lookup*' in Sequence * Add symbolic operators for lcons, rcons, append, and lookup from the Sequence API * Add symbolic operators for subsetEq, difference, intersection and union from the set API * rename 'single' to 'singleton' in all APIs * rename 'intersect' to 'intersection' in Collection and Associated Collection APIs * add 'adjustOrInsert' to the Associated Collection API (see http://www.haskell.org//pipermail/libraries/2006-February/ 004952.html for a use case) And the following are the remaining major issues I'd like to get feedback about: -- Edison defines 'subset' and 'subsetEq' in the set API. Data.Set has operations with the same meanings named 'isProperSubsetOf' and 'isSubsetOf'. I am considering adoping the Data.Set names, but they are quite a bit longer. Also what about the meaning of "subset" vs "proper subset"? What do you find most intuitive? -- Data.Map defines 'isSubmapOfBy' and 'isProperSubmapBy'. Edison has no corresponding API methods. Are they important enough to add? -- What do you think about the symbolic operators? Is the fact that (++) clashes with the Prelude a problem? Should the operators be folded into the "Data.Edison" module instead? (see http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data-Edison- Sym.html) In addition, I'm interested in any API related feedback you might have. Code and docs can be found at the project website: http://www.eecs.tufts.edu/~rdocki01/edison.html Or you can go directly to the haddock docs: http://www.eecs.tufts.edu/~rdocki01/docs/edison/index.html Or the darcs repo: http://www.eecs.tufts.edu/~rdocki01/edison/ Thanks, Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Show replies by date

Benjamin Franksen

3 Mar 3 Mar

4:29 p.m.

...

-- Edison defines 'subset' and 'subsetEq' in the set API. Data.Set has operations with the same meanings named 'isProperSubsetOf' and 'isSubsetOf'. I am considering adoping the Data.Set names, but they are quite a bit longer. Also what about the meaning of "subset" vs "proper subset"? What do you find most intuitive?

I prefer the mathematical convention: subset includes equality, proper subset excludes it. And proper subset can very easily defined by library user (just check isSubset and sizes differ) so is not essential. The 'is' prefix is good because it makes the meaning clearer. The 'Of' suffix is bad, because it doesn't and adds noise.

...

-- Data.Map defines 'isSubmapOfBy' and 'isProperSubmapBy'. Edison has no corresponding API methods. Are they important enough to add?

Can they be easily added by the user w/o loosing much efficiency? Then no. Else yes. Note that isSubmapOfBy and isProperSubmapOfBy are the only 'By'-suffix functions exported from Data.Map. Ben

David Menendez

4 Mar 4 Mar

1:28 a.m.

Robert Dockins writes:

...

The second release candidate for Edison 1.2 is now ready for comments. The notable changes from RC1 are:

* add strict variants of all folds and reduces

You know, I was looking through the code last night and thought, "what about foldl'?" Funny. That being said, eight variants of fold* (plus six variants of reduce*) feels excessive to me. I can understand the desire for generality, but I'm not sure it's possible to decide between, say, foldr and foldr' without knowing the sequence type. As I understand it, with regular lists you almost always[*] want either foldr or foldl', because foldl builds up a chain of unevaluated thunks and foldr' has linear call stack growth. (When I was learning ML, we were told to always prefer foldl over foldr for that reason.) With reverse lists (i.e., Rev []), the situation is reversed: foldr' is tail-recursive, foldl is fully lazy, and foldr and foldl' have linear growth. For the other sequences, there may not be clear winners. With SimpleQueue, for example, the choice between foldr and foldr' might depend on how the queue is arranged. Maybe it would be better to have each implementation default to the strictness that's most likely best, at least in the generic Seq interface. That way, [] would get strict foldl and non-strict foldr, but Rev [] would get strict foldr and non-strict foldl. Individual modules could provide extra variants as desired outside the class interface. [*] I guess you might want a non-strict foldl with lists if the foldee is expensive enough that building up a pile of thunks is worth it if they can potentially be skipped.

...

-- Edison defines 'subset' and 'subsetEq' in the set API. Data.Set has operations with the same meanings named 'isProperSubsetOf' and 'isSubsetOf'. I am considering adoping the Data.Set names, but they are quite a bit longer. Also what about the meaning of "subset" vs "proper subset"? What do you find most intuitive?

I would go with subset and properSubset. The "isSubsetOf" form only makes sense if used in-line, but we have (|=) for that.

...

In addition, I'm interested in any API related feedback you might have.

Is there a reason why the TernaryTrie and the PatriciaLoMap don't implement OrdAssoc? For the Seq class, I'd rather have the O(n) stuff listed with the implementations, rather than having a default in the interface. Finally, thanks for reviving Edison. I had always been disappointed by its obscurity. -- David Menendez | "In this house, we obey the laws http://www.eyrie.org/~zednenem | of thermodynamics!"

Robert Dockins

2:34 a.m.

On Friday 03 March 2006 08:28 pm, you wrote:

...

Robert Dockins writes:

...
The second release candidate for Edison 1.2 is now ready for comments. The notable changes from RC1 are:

* add strict variants of all folds and reduces

You know, I was looking through the code last night and thought, "what about foldl'?" Funny.

That being said, eight variants of fold* (plus six variants of reduce*) feels excessive to me. I can understand the desire for generality, but I'm not sure it's possible to decide between, say, foldr and foldr' without knowing the sequence type.

Well, that's certainly an issue I considered when adding them to the API.

...

As I understand it, with regular lists you almost always[*] want either foldr or foldl', because foldl builds up a chain of unevaluated thunks and foldr' has linear call stack growth. (When I was learning ML, we were told to always prefer foldl over foldr for that reason.)

With reverse lists (i.e., Rev []), the situation is reversed: foldr' is tail-recursive, foldl is fully lazy, and foldr and foldl' have linear growth.

I toyed for a good while with only defining some of the strict folds, or doing some other halfway measure. The thing that eventually made be decide to throw in the whole kitchen sink (all the strict folds) was the realization that the API is specified at a purely semantic level; at that level, the only thing that matters about strict folds is that they have exactly the same semantics as the non-strict fold given a combining function which is strict in the correct argument(s) (otherwise they are guaranteed to be the same except that the strict fold may diverge in cases where the non-strict fold does not). When it comes time to implement, the strict folds give you more leway to force closures early BUT YOU DON'T HAVE TO! For the regular list example, I could define foldr' = foldr and it would be correct according to the specification. For now I have defined all the implementations to be as strict as possible in all cases, but I can make them lazier later if it turns out to have better space behavior; as you mention, foldr' on regular lists is problematic and it will probably be the first thing to be reverted to a lazy implementation.

...

For the other sequences, there may not be clear winners. With SimpleQueue, for example, the choice between foldr and foldr' might depend on how the queue is arranged.

Maybe it would be better to have each implementation default to the strictness that's most likely best, at least in the generic Seq interface. That way, [] would get strict foldl and non-strict foldr, but Rev [] would get strict foldr and non-strict foldl. Individual modules could provide extra variants as desired outside the class interface.

Both the collection and the associated collection have fold (and fold') which folds over elements in an unspecified order. Perhaps I could add that to the sequence API to give the "best behavior" choice to the sequence implementer. Then for lists you would have (lazy) fold = foldr and (strict) fold' = foldl'.

...

[*] I guess you might want a non-strict foldl with lists if the foldee is expensive enough that building up a pile of thunks is worth it if they can potentially be skipped.

...

...
-- Edison defines 'subset' and 'subsetEq' in the set API. Data.Set has operations with the same meanings named 'isProperSubsetOf' and 'isSubsetOf'. I am considering adoping the Data.Set names, but they are quite a bit longer. Also what about the meaning of "subset" vs "proper subset"? What do you find most intuitive?

I would go with subset and properSubset. The "isSubsetOf" form only makes sense if used in-line, but we have (|=) for that.

I'm leaning in this direction as well; I think I've pretty well decided the names as they stand now have to go (they contradict standard usage), and I don't really like the Data.Map names.

...

...
In addition, I'm interested in any API related feedback you might have.

Is there a reason why the TernaryTrie and the PatriciaLoMap don't implement OrdAssoc?

Not really. When I inherited Edison, none of the associated collections implemented the Ord* classes (I don't know why). I added AssocList and StandardMap because they were easy. The others are significantly more complicated and I thought it was more important to work out the API issues and kick 1.2 out the door before tacking this (same reason the 'strict' folds for TernaryTrie are really just the non-strict folds).

...

For the Seq class, I'd rather have the O(n) stuff listed with the implementations, rather than having a default in the interface.

Noted. Its a dreadfully tedious cut-n-paste job which is why I haven't done it; patches and clever scripts to do this are welcome ;-) Of course, it would be nice to have time complexity for the Collection and Associated Collections as well...

...

Finally, thanks for reviving Edison. I had always been disappointed by its obscurity.

Indeed; it has always seemed to me like a great project that never got the attention it deserved. Thanks for your comments! Rob Dockins

Seth Kurtzberg

3:38 a.m.

I just noticed this email thread. I've wanted to use some things in Edison for a while so I'm glad to see that it is still alive. As far as the fold variations go, I can't see that it can possibly hurt anything to include more variations, especially if the work is already done. I would like to see the documentation constructed so that the ones that are most likely to be "correct" (or desirable) are listed first. That way I don't have to figure out exactly what the issues are with them in order to grab the "right" one. Especially since I usually look at the more subtle performance issues after I've built a working program. Seth On Fri, 3 Mar 2006 21:34:39 -0500 Robert Dockins wrote:

...

On Friday 03 March 2006 08:28 pm, you wrote:

...
Robert Dockins writes:

...
The second release candidate for Edison 1.2 is now ready for comments. The notable changes from RC1 are:

* add strict variants of all folds and reduces

You know, I was looking through the code last night and thought, "what about foldl'?" Funny.

That being said, eight variants of fold* (plus six variants of reduce*) feels excessive to me. I can understand the desire for generality, but I'm not sure it's possible to decide between, say, foldr and foldr' without knowing the sequence type.

Well, that's certainly an issue I considered when adding them to the API.

...
As I understand it, with regular lists you almost always[*] want either foldr or foldl', because foldl builds up a chain of unevaluated thunks and foldr' has linear call stack growth. (When I was learning ML, we were told to always prefer foldl over foldr for that reason.)

With reverse lists (i.e., Rev []), the situation is reversed: foldr' is tail-recursive, foldl is fully lazy, and foldr and foldl' have linear growth.

I toyed for a good while with only defining some of the strict folds, or doing some other halfway measure. The thing that eventually made be decide to throw in the whole kitchen sink (all the strict folds) was the realization that the API is specified at a purely semantic level; at that level, the only thing that matters about strict folds is that they have exactly the same semantics as the non-strict fold given a combining function which is strict in the correct argument(s) (otherwise they are guaranteed to be the same except that the strict fold may diverge in cases where the non-strict fold does not). When it comes time to implement, the strict folds give you more leway to force closures early BUT YOU DON'T HAVE TO! For the regular list example, I could define foldr' = foldr and it would be correct according to the specification. For now I have defined all the implementations to be as strict as possible in all cases, but I can make them lazier later if it turns out to have better space behavior; as you mention, foldr' on regular lists is problematic and it will probably be the first thing to be reverted to a lazy implementation.

...
For the other sequences, there may not be clear winners. With SimpleQueue, for example, the choice between foldr and foldr' might depend on how the queue is arranged.

Maybe it would be better to have each implementation default to the strictness that's most likely best, at least in the generic Seq interface. That way, [] would get strict foldl and non-strict foldr, but Rev [] would get strict foldr and non-strict foldl. Individual modules could provide extra variants as desired outside the class interface.

Both the collection and the associated collection have fold (and fold') which folds over elements in an unspecified order. Perhaps I could add that to the sequence API to give the "best behavior" choice to the sequence implementer. Then for lists you would have (lazy) fold = foldr and (strict) fold' = foldl'.

...
[*] I guess you might want a non-strict foldl with lists if the foldee is expensive enough that building up a pile of thunks is worth it if they can potentially be skipped.

...
...
-- Edison defines 'subset' and 'subsetEq' in the set API. Data.Set has operations with the same meanings named 'isProperSubsetOf' and 'isSubsetOf'. I am considering adoping the Data.Set names, but they are quite a bit longer. Also what about the meaning of "subset" vs "proper subset"? What do you find most intuitive?

I would go with subset and properSubset. The "isSubsetOf" form only makes sense if used in-line, but we have (|=) for that.

I'm leaning in this direction as well; I think I've pretty well decided the names as they stand now have to go (they contradict standard usage), and I don't really like the Data.Map names.

...
...
In addition, I'm interested in any API related feedback you might have.

Is there a reason why the TernaryTrie and the PatriciaLoMap don't implement OrdAssoc?

Not really. When I inherited Edison, none of the associated collections implemented the Ord* classes (I don't know why). I added AssocList and StandardMap because they were easy. The others are significantly more complicated and I thought it was more important to work out the API issues and kick 1.2 out the door before tacking this (same reason the 'strict' folds for TernaryTrie are really just the non-strict folds).

...
For the Seq class, I'd rather have the O(n) stuff listed with the implementations, rather than having a default in the interface.

Noted. Its a dreadfully tedious cut-n-paste job which is why I haven't done it; patches and clever scripts to do this are welcome ;-) Of course, it would be nice to have time complexity for the Collection and Associated Collections as well...

...
Finally, thanks for reviving Edison. I had always been disappointed by its obscurity.

Indeed; it has always seemed to me like a great project that never got the attention it deserved.

Thanks for your comments! Rob Dockins _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Christian Maeder

6 Mar 6 Mar

1:27 p.m.

Robert Dockins wrote:

...

The second release candidate for Edison 1.2 is now ready for comments.

Will this make Jean-Philippe's work obsolete? http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework Being (currently) no enthusiast anyway, I would not even know which classes to choose now, since Edison has been improved.

...

-- What do you think about the symbolic operators?

I prefer alphanumeric identifiers.

...

In addition, I'm interested in any API related feedback you might have.

Data.Edison.Assoc.fold "Combine all the elements in the associative collection, given a combining function and an initial value. The elements are processed in an unspecified order. Note that fold ignores the keys." This fold function looks pretty unsafe to me, because (the common) "fold (:) []" might reveal implementation details that one wants to hide. Such a function should be called "structuralFold" (or something like that). Cheers Christian

Jean-Philippe Bernardy

2:04 p.m.

On 3/6/06, Christian Maeder wrote:

...

Robert Dockins wrote:

...
The second release candidate for Edison 1.2 is now ready for comments.

Will this make Jean-Philippe's work obsolete? http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework

I intend to finish and publish that work... and of course make it the superior choice. :) In time I hope to retrofit edison concrete modules into my framework. Cheers, JP.

Adrian Hey

8:42 p.m.

On Monday 06 Mar 2006 2:04 pm, Jean-Philippe Bernardy wrote:

...

On 3/6/06, Christian Maeder wrote:

...
Robert Dockins wrote:

...
The second release candidate for Edison 1.2 is now ready for comments.

Will this make Jean-Philippe's work obsolete? http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework

I intend to finish and publish that work... and of course make it the superior choice. :) In time I hope to retrofit edison concrete modules into my framework.

Hello Jean-Philippe, I haven't had much time to study this, but in the light of recent discussions about generalised tries I played about a little last night and did have a go at producing n-Tuple maps from instances of your classes. e.g. for pairs I ended up trying to define something like this.. instance (MapLike c1 k1 c2, MapLike c2 k2 a) => MapLike (Map2 c1) (k1,k2) a where.. I must admit I found it hard going, but maybe I'm just using it wrong and I didn't have much time to play about with alternatives. Perhaps you'd care to give it a shot. I'm not sure about the proper use of fundeps, but having the associated values dependent on the map type seemed a little awkward to me and also seems to make it hard to properly define intersectionWith for maps with the same key type but different associated value types (one of the infelicities you mention). Would something like this be possible (using "Map" instead of "MapLike")? class Ord k => Map map k | map -> k where (I think I'd like an Ord constraint on keys) then given.. newtype (Map2 map1 map2 a) = Map2 ( map1 (map2 a)) newtype (Map3 map1 map2 map3 a) = Map3 (Map2 map1 map2 (map3 a)) newtype (Map4 map1 map2 map3 map4 a) = Map4 (Map3 map1 map2 map3 (map4 a)) or perhaps.. newtype (Map2 map1 map2 a) = Map2 (map1 ( map2 a)) newtype (Map3 map1 map2 map3 a) = Map3 (map1 (Map2 map2 map3 a)) newtype (Map4 map1 map2 map3 map4 a) = Map4 (map1 (Map3 map2 map3 map4 a)) we could define.. instance (Map map1 k1, Map map2 k2) => Map (Map2 map1 map2) (k1,k2) where instance (Map map1 k1, Map map2 k2, Map map3 k3) => Map (Map3 map1 map2 map3) (k1,k2,k3) where instance (Map map1 k1, Map map2 k2, Map map3 k3, Map map4 k4) => Map (Map4 map1 map2 map3 map4) (k1,k2,k3,k4) where Implementing these instances efficiently would be a good test of the completeness of the class methods too (I haven't tried it yet :-). But maybe there's a better way. Anybody got any alternative suggestions? Regards -- Adrian Hey

Robert Dockins

3:11 p.m.

On Mar 6, 2006, at 8:27 AM, Christian Maeder wrote:

...

Robert Dockins wrote:

...
The second release candidate for Edison 1.2 is now ready for comments.

Will this make Jean-Philippe's work obsolete?

http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework

Being (currently) no enthusiast anyway, I would not even know which classes to choose now, since Edison has been improved.

I personally have no particular stance on this. Jean-Philippe's design is quite different from Edison's and in the absence of focus groups and user studies we don't have any real information about what design appeals best to programmers. I suspect that, in the end, only one type-class data structure abstraction will gain widespread use, but I'm not going to make any predictions at this point. Obviously, standardization decisions can have a strong impact on this process. As for myself, I'll probably be using Edison, because I'm pretty familiar with it ;-)

...

...
-- What do you think about the symbolic operators?

I prefer alphanumeric identifiers.

So do I, as it happens :-) Which is why the symbols are only aliases.

...

...
In addition, I'm interested in any API related feedback you might have.

Data.Edison.Assoc.fold

"Combine all the elements in the associative collection, given a combining function and an initial value. The elements are processed in an unspecified order. Note that fold ignores the keys."

This fold function looks pretty unsafe to me, because (the common) "fold (:) []" might reveal implementation details that one wants to hide. Such a function should be called "structuralFold" (or something like that).

I'm not sure I understand; why do you think that revealing structure is unsafe? It's not like you can subvert the program with this additional knowledge. Obviously, if you rely on the elements being provided in some particular implementation-dependent order than your code is broken (perhaps this should be more explicit in the docs?). I'm hesitant to give "fold" a longer name because I think it should be the first fold a programmer reaches for. In the fairly common case where you have a commutative, associative function (eg (+)), you don't care in what order the function is applied to the elements, and you have fold f z === foldl f z === foldr f z. Why should you prefer "fold"? Because it gives data structure implementers the freedom to define a fold that is more efficient than foldr or foldl or whatever else (usually, as you surmised, by following the internal structure). In light of this and previous discussions about folds, I think I will be adding a section to the docs about how to chose an appropriate fold for your application.

...

Cheers Christian

Thanks for your comments! Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Christian Maeder

3:56 p.m.

Robert Dockins wrote:

...

I'm not sure I understand; why do you think that revealing structure is unsafe?

I don't want that i.e. equal maps yield different results, just because they are internally differently balanced.

...

I'm hesitant to give "fold" a longer name because I think it should be the first fold a programmer reaches for.

I don't aggree. I immediately made it wrong by my own "toList = fold (:) []" and did not even notice foldr or foldl. The fold for efficiency should be mainly used by library writers. But users I tell: correctness first, efficiency (when needed) later!

...

In the fairly common case where you have a commutative, associative function (eg (+)), you don't care in what order the function is applied to the elements

These properties are not garuanteed and difficult to track down in the few (but severe) cases it's made wrong.

...

, and you have fold f z === foldl f z === foldr f z.

I see, foldr and foldl are your safe versions! (I think Jean-Philippe only has a safe "fold" and misses an unsafe one.)

...

In light of this and previous discussions about folds, I think I will be adding a section to the docs about how to chose an appropriate fold for your application.

At least that plus a warning! Cheers Christian

Robert Dockins

5:26 p.m.

On Mar 6, 2006, at 10:56 AM, Christian Maeder wrote:

...

Robert Dockins wrote:

...
I'm not sure I understand; why do you think that revealing structure is unsafe?

I don't want that i.e. equal maps yield different results, just because they are internally differently balanced.

Well, this obviously depends on what you mean by "equal". That can get slippery pretty fast. I assume from your comments that you mean extensional equality (ie, viewing the map as a partial function with a known finite domain).

...

...
I'm hesitant to give "fold" a longer name because I think it should be the first fold a programmer reaches for.

I don't aggree. I immediately made it wrong by my own "toList = fold (:) []" and did not even notice foldr or foldl. The fold for efficiency should be mainly used by library writers. But users I tell: correctness first, efficiency (when needed) later!

I tend to agree with that philosophy. On the other hand, if you are bothering to use a dedicated data structure library, its probably because you are at least somewhat cognizant of performance issues; otherwise you'd just use lists and tuples for everything (a phenomenon one one can actually observe in a lot of projects).

...

...
In the fairly common case where you have a commutative, associative function (eg (+)), you don't care in what order the function is applied to the elements

These properties are not garuanteed and difficult to track down in the few (but severe) cases it's made wrong.

These properties _can't_ be guaranteed (at least not by Haskell compilers). It is true that you can shoot yourself in the foot with the Edison API (this is one of the stronger criticisms one can level at it, IMO). Some operations even allow you to violate internal data structure consistency if used incorrectly. Edison has already made the design choice to present the programmer with abuse-able operations. I think an arbitrary-order fold is pretty tame (on the other hand, it is a significantly more "visible" operation than the actually unsafe ones). Humm... I wonder if there is a case for extracting a subset of the API which only presents "safe" (ie "difficult/impossible to abuse") operations? Perhaps with a simplified class hierarchy as well.... What do you think? That way you could start by using the simplified API and move to the full API when/if you discover opportunities for performance enhancements from the less-safe or more esoteric operations.

...

...
, and you have fold f z === foldl f z === foldr f z.

I see, foldr and foldl are your safe versions! (I think Jean- Philippe only has a safe "fold" and misses an unsafe one.)

Well, its actually more complicated than safe vs unsafe. The base "AssocX" class (wherein fold appears) doesn't assume an ordering relation on keys (no "Ord k" context). What we do in the absence of a total ordering on keys is to present the elements in an "arbitrary" order. Because the order is "arbitrary" we can chose it to be advantageous (ie, follow the internal structure), but that's really just a side benefit of the relaxed semantics. The subclass "OrdAssocX" further assumes an ordering on keys, so it can provide foldr and friends (because now key ordering is defined). However, even with foldr, foldl, etc, there is a degree undefinedness if the data structure is a finite relation rather than a finite map (in what order should you present elements bound to equal keys?). The folds for bags have similar properties.

...

...
In light of this and previous discussions about folds, I think I will be adding a section to the docs about how to chose an appropriate fold for your application.

At least that plus a warning!

Indeed. Its seems folds are sufficiently complicated to warrant some special attention in the docs. In addition, I suppose I should track down each and every operations which can be observed to behave differently based on hidden state and mark them as "non-deteministic" or some such. What is a good term for an operation whose result is only partially defined? I really want to reserve the term "unsafe" for operations that can violate internal consistency. Thanks again! This is exactly the kind of discussion that helps produce good APIs. Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Christian Maeder

6:33 p.m.

Robert Dockins wrote:

...

Well, this obviously depends on what you mean by "equal". That can get slippery pretty fast. I assume from your comments that you mean extensional equality (ie, viewing the map as a partial function with a known finite domain).

Right, this corresponds to the Eq instance for the current Data.Map (with a safe fold) as an _abstract_ data type. I can hardly imagine a (finite) map implementation without Ord instance for the keys. With equality only you have no chance to yield the keys of equal maps in a fixed order (by a toList function). Thus an association list is not a (mathematical) map (and your fold function is fine for such lists).

...

These properties _can't_ be guaranteed (at least not by Haskell compilers).

I know, also the total-order-property of Ord instances can not be guaranteed (but "deriving" helps, of course).

...

However, even with foldr, foldl, etc, there is a degree undefinedness if the data structure is a finite relation rather than a finite map (in what order should you present elements bound to equal keys?). The folds for bags have similar properties.

Relations are (conceptually) sets of pairs and bags are multisets (or maps from elements to their occurrence counts). In both cases I'd expect to see an ascending ordered list when shown. Also for Tupels there are several possibilities for total orderings, but the type class Ord forces you to pick one. Christian

Robert Dockins

7:25 p.m.

On Mar 6, 2006, at 1:33 PM, Christian Maeder wrote:

...

Robert Dockins wrote:

...
Well, this obviously depends on what you mean by "equal". That can get slippery pretty fast. I assume from your comments that you mean extensional equality (ie, viewing the map as a partial function with a known finite domain).

Right, this corresponds to the Eq instance for the current Data.Map (with a safe fold) as an _abstract_ data type.

I can hardly imagine a (finite) map implementation without Ord instance for the keys.

http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-Mem- StableName.html#t%3AStableName No Ord instance. Would be very useful to be able to put it in a map; it hashes, so its not a total waste to do it. I recall seeing conversations on some haskell list in the past about how people wanted to put stable names in Data.Map but couldn't because of the need for an Ord instance. Really any data type which isn't totally ordered but can be hashed could be used in a map this way. (Not implemented in Edison currently, but it should be).

...

With equality only you have no chance to yield the keys of equal maps in a fixed order (by a toList function).

Right.

...

Thus an association list is not a (mathematical) map (and your fold function is fine for such lists).

I don't think I agree here. What definition are you using for map? Maps don't require a total ordering on their domain. http:// mathworld.wolfram.com/Map.html Or are you getting at something else?

...

...
These properties _can't_ be guaranteed (at least not by Haskell compilers).

I know, also the total-order-property of Ord instances can not be guaranteed (but "deriving" helps, of course).

...
However, even with foldr, foldl, etc, there is a degree undefinedness if the data structure is a finite relation rather than a finite map (in what order should you present elements bound to equal keys?). The folds for bags have similar properties.

Relations are (conceptually) sets of pairs and bags are multisets (or maps from elements to their occurrence counts). In both cases I'd expect to see an ascending ordered list when shown.

You'd need a total ordering on the elements for that -- Edison doesn't assume anything about the elements of an associated collection (and neither does Data.Map). What if you wanted a relation/map between, say, integers and functions :: String -> Bool or between Strings and IO actions or something like that? Re: bags as maps to counts; this can be problematic, for reasons discussed here (http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data- Edison-Coll.html) See the section titled "Notes on Observability". If you assume a world where everything has an Eq instance representing decidable leibniz equality and Ord instances representing total orderings, a lot of things become easier. Unfortunately, we don't live in that world, which is why we have to worry about all this other stuff. The only thing we assume is that an Eq instance defines some equivalence relation and that an Ord instance (if it exists) defines a total ordering (and if those assumptions are violated, all is lost and we give up).

...

Also for Tupels there are several possibilities for total orderings, but the type class Ord forces you to pick one.

This is true, but orthogonal. I actually considered adding an API where you could specify orderings with arbitrary functions :: a -> a -

...

Ordering, but I decided it would be too hard to use (and a pain to write besides).

Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Christian Maeder

7 Mar 7 Mar

11:51 a.m.

Robert Dockins wrote:

...

...
Thus an association list is not a (mathematical) map (and your fold function is fine for such lists).

I don't think I agree here. What definition are you using for map? Maps don't require a total ordering on their domain. http:// mathworld.wolfram.com/Map.html

Well, "A map is a way of associating unique objects to every element in a given _set_"! The (strong or given) equality of [(a,b)] does not correspond to the (possibly uncomputable) equality of maps (where order does not matter). So association lists are no maps unless you hide (or ignore) all functions that allow you to observe differences (i.e, your fold and elements function). Also for hash maps you have no chance to order elements with equal hash keys. Maybe all computable and observable maps (and sets) require a total order? Of course all mentioned data structures are useful for certain purposes (and it's maybe only a matter of documentation). Cheers Christian

Robert Dockins

4:20 p.m.

I confess I've become a bit lost. I'm not longer sure I understand the points you're making; I'll make my best effort to answer, but if I reply to points other than the ones you've made, please forgive me. On Mar 7, 2006, at 6:51 AM, Christian Maeder wrote:

...

Robert Dockins wrote:

...
...
Thus an association list is not a (mathematical) map (and your fold function is fine for such lists). I don't think I agree here. What definition are you using for map? Maps don't require a total ordering on their domain. http:// mathworld.wolfram.com/Map.html

Well, "A map is a way of associating unique objects to every element in a given _set_"!

Clarification: "finite set" (for our purposes). I believe you are highlighting that sets have no intrinsic notion of order?

...

The (strong or given) equality of [(a,b)] does not correspond to the (possibly uncomputable) equality of maps (where order does not matter).

What I think you are saying: Let al :: [(a,b)] be a finite association list, where type 'a' has a decidable equality relation, and if (x,y) and (w,z) both appear in al such that x == w (equality), then y === z (identity, not necessarily decidable). Then let toMap1 and toMap2 be two different implementations of coercion from association lists to maps and: m1 = toMap1 al m2 = toMap2 al We must have that m1 is extensional equal to m2. However, if we have fromMap1 and fromMap2 that extract an association list from the different maps we might have: fromMap1 m1 <> fromMap2 (where tuple equality is defined using == for a and === for b) The problem is that there is an entire equivalence class of association lists each of which represents the map in question; each "fromMap*" has to make a particular concrete choice from this equivalence class, but they might make different choices. I think I have decided to call such operations "ambiguous" because they (conceptually) generate a set of possible results and arbitrarily choose one. Clearly, such behavior might be confusing to a programmer, so it should be carefully marked. The "safe" operations you prefer collapse the ambiguity as part of the function's contract by, eg, specifying that keys appear in order. I also suspect this is the primary source of our disagreement; I'm interested in allowing programmers access to ambiguous operations, and you think that the exposed API should only consist of well- defined (non-ambiguous) operations. Is that a fair characterization?

...

So association lists are no maps unless you hide (or ignore) all functions that allow you to observe differences (i.e, your fold and elements function).

If I were being particularly pedantic, I would instead say that association lists are not maps (at all). Rather, they represent and can be transformed into (finite) maps, in much the same way that, eg, source code represents and can be transformed into a particular abstract syntax tree. Parsing and pretty printing a source code file will not (in general) result in identical concrete syntax. Would you therefore say that concrete source code is not a program? You might (if you were being particularly pedantic), but I think most people would simply equate source code "up to parsing", at least when talking about the "program". I think we have the same situation here, were association lists can be informally identified with the maps they represent, but have to be thought about up to reordering.

...

Also for hash maps you have no chance to order elements with equal hash keys. Maybe all computable and observable maps (and sets) require a total order

What do you mean by computable in this context?

...

Of course all mentioned data structures are useful for certain purposes (and it's maybe only a matter of documentation).

Cheers Christian

Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

7062

Age (days ago)

7066

Last active (days ago)

List overview

Download

14 comments

7 participants

participants (7)

Adrian Hey
Benjamin Franksen
Christian Maeder
David Menendez
Jean-Philippe Bernardy
Robert Dockins
Seth Kurtzberg

Edison 1.2rc2

David Menendez

tags

participants (7)