
Hi, A revision of DData. http://users.skynet.be/jyp/DData/doc http://users.skynet.be/jyp/DData/ddata.tar.gz New sequences are added. I've tried to reflect settled issues in the code. If it is not so, tell me. I haven't done anything to argument ordering yet. I have nothing against the right bias, so, I can implement it, but it is easy to forget some cases... That change would need review. Cheers, JP. __________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com

Hello, why isn't there an Seq.append? And why does Seq.concat have the type Seq (Seq a) -> Seq a instead of [Seq a] -> Seq a which would be more consistent with the other modules? Wolfgang

JP Bernardy wrote:
A revision of DData.
In Set, I find the name uncheckedMapMonotonic unnecessary long. Again, I suggest to omit toAscList (as it is the same as toList) and to rename fromDistinctAscList to fromAscList and provide no extra treatment for ascending list with duplicates. (Maybe a linear nub variant can be provided elsewhere: ... = map head . group) Thus "Asc" could indicate unchecked variants and always means strongly ascending. (Therefore I suggested "mapAsc") Christian

--- Christian Maeder
JP Bernardy wrote:
A revision of DData.
In Set, I find the name
uncheckedMapMonotonic
unnecessary long.
I'm tempted to drop the unchecked prefix. It is rather obvious that monotonicity of the function is not checked. mapAsc doesn't appeal to me, because it makes less explicit the monotonicity of the mapped function.
Again, I suggest to omit toAscList (as it is the same as toList)
Yet, it has been shown that "toDescList" might be needed, in a lazy setting. Thus, having the symmetric "toAscList" is logical; even if the semantic is the same a "toList".
and to rename fromDistinctAscList to fromAscList and provide no extra treatment for ascending list with duplicates. (Maybe a linear nub variant can be provided elsewhere: ... = map head . group) Thus "Asc" could indicate unchecked variants and always means strongly ascending. (Therefore I suggested "mapAsc")
This is quite a lot of changes wrt. the original. I'd wait for more comments o this. You also asked again if MultiSet should be renamed to Bag. My argument went as such: Daan's remark about the difference between bags and multisets is: " A multi set differs from a /bag/ in the sense that it is represented as a map from elements to occurrence counts instead of retaining all elements. This means that equality on elements should be defined as a /structural/ equality instead of an equivalence relation. If this is not the case, operations that observe the elements, like 'filter' and 'fold', should be used with care. " You argued that Eq is (nearly) always assumed to be defined by structural equality, so "Bag" would be just as good as "MultiSet". Yet, a true "Bag" type may be added later, so I did not bother to change the name. Cheers, JP. __________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com

JP Bernardy
You argued that Eq is (nearly) always assumed to be defined by structural equality, so "Bag" would be just as good as "MultiSet".
I think you're saying we might need a bag to hold items where we want to group things according to (equivalence on) some arbitrary key? Perhaps we could have the bag datatype parametrised over a user-supplied ordering, so that Set a would be equivalent to a Bag compare a (with compare :: a -> a -> Ordering) The bag could then hold regular (Eq-based) Sets of items. Would this make sense? Would it be useful? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

--- Ketil Malde
I think you're saying we might need a bag to hold items where we want to group things according to (equivalence on) some arbitrary key? Perhaps we could have the bag datatype parametrised over a user-supplied ordering, so that
Set a
would be equivalent to a
Bag compare a (with compare :: a -> a -> Ordering)
The bag could then hold regular (Eq-based) Sets of items.
Would this make sense? Would it be useful?
I'm not sure that I understand either :) Let me give an example...
type MyString = MyString String
instance Eq MyString where (MyString x) == (MyString y) = length x == length y
ms = MyString
Bag.toString $ Bag.fromString $ [ms "x", ms "y"]
should return the original list, [ms "x", ms "y"].
MultiSet.toString $ MultiSet.fromString $ [ms "x", ms "y"]
will return [ms "x", ms "x"]. That is why I choose to keep the "MultiSet" name. If Eq is defined as structural equality, MultiSet and Bag are equivalent. Cheers, JP. __________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com

JP Bernardy
Would this make sense? Would it be useful?
I'm not sure that I understand either :)
Yeah, my definitions weren't exactly clear. I'll try to elucidate:
Let me give an example...
type MyString = MyString String
instance Eq MyString where (MyString x) == (MyString y) = length x == length y
I'm sure you'll agree that this is not something anybody sane would do. (Actually, I did something remotely similar, and I've regretted it ever since; it gave me some nasty, hard-to-track down bugs. I always derive Eq since then.) Which is why I think Bag perhaps should, instead of relying on Eq (or actually Ord, I suppose), be provided with a separate parameter specifying how to compare items for equality for bagging purposes.
Bag.toString $ Bag.fromString $ [ms "x", ms "y"]
So instead of doing this, I would like to be able to do mkBag myComp [ms "x", ms "y"] where myComp x y = compare (length x) (length y) which would then bag all equal-length strings (possibly storing them in multisets using regular Eq).
That is why I choose to keep the "MultiSet" name.
Just to make that clear, I've no issue with the naming.
If Eq is defined as structural equality, MultiSet and Bag are equivalent.
And since Eq "should" define structural equality, I propose a different interface for Bags (if we need them at all, that is) Was that a bit clearer? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde wrote:
I think you're saying we might need a bag to hold items where we want to group things according to (equivalence on) some arbitrary key?
I think, using an equivalence for the Eq class is not a good idea in general and particularly not for the Set, (keys of a) Map and Bag data types. An equivalence introduces a quotient, where a representative of an equivalence class is not fixed. So using an equivalence bears some risk as the following property does not hold: forall f . a == b => f(a) == f(b) (e.g. for f = show) Furthermore, having only an equivalence implies that there is a stronger equality below that cannot be also an instance of Eq. (A solution might be to add a further "Equiv" and "OrdForEquiv" class or to always pass in such functions as additional argument as you suggested.) Daan's Set implementation explicitely explains what happens, if your Eq (and matching Ord) instance is only an equivalence, by saying it is "left-biased". This feature is not a particular Set property (unions are symmetric!), but only of a specific Set implementation, that you may exploit (or not). In fact, I also use equalities that ignore some debugging information like positions, but as long these positions are only shown, this seems to be ok. Another example, where an equivalence-order argument seems to make sense, is for a map implementation based on "set of pairs", where the second component of each pair is ignored for the order. But for such an implementation I find the name "set" wrongly chosen. Christian

Christian Maeder
Ketil Malde wrote:
I think you're saying we might need a bag to hold items where we want to group things according to (equivalence on) some arbitrary key?
I think, using an equivalence for the Eq class is not a good idea
(I assume you mean "using an arbitrary equivalence relation", and not that Eq's (==) shouldn't be an equivalence relation)
in general and particularly not for the Set, (keys of a) Map and Bag data types.
An equivalence introduces a quotient, where a representative of an equivalence class is not fixed.
Furthermore, having only an equivalence implies that there is a stronger equality below that cannot be also an instance of Eq.
I've already said that I think Eq's (==) should be "structural equality", which I thought meant that there shouldn't be any "deeper" equality. My question is that since Set uses this equality, and MultiSet uses it with a count, I don't see how this leaves any space for Bag? Unless, that is, it uses an arbitrary equivalence relation, and bags an object in its quotient group (which possibly is a Set with "real" equality). Maybe I'm mistaken, maybe there's some other way Bag makes sense -- I've never really felt I needed Bag (nor MultiSet, although I've used explicit FMs with counts).
In fact, I also use equalities that ignore some debugging information like positions, but as long these positions are only shown, this seems to be ok.
I added Int labels to somewhat complex structures in order to speed up Eq (quite) a bit. Which means I accept the burden of keeping them unique.
Another example, where an equivalence-order argument seems to make sense, is for a map implementation based on "set of pairs", where the second component of each pair is ignored for the order. But for such an implementation I find the name "set" wrongly chosen.
So - how about "Bag" instead? :-) -kzm -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde wrote:
My question is that since Set uses this equality, and MultiSet uses it with a count, I don't see how this leaves any space for Bag? Unless, that is, it uses an arbitrary equivalence relation, and bags an object in its quotient group (which possibly is a Set with "real" equality).
"Bag" and "MultiSet" are synonyms. Bags are something between Lists and Sets. Lists become sets by changing "++" to "union" being commutative, associative and idempotent, Lists become only Bags, when idempotency is omitted. Thus the order of elements does not matter as for Sets, but for Bags the multiplicity is important. So either an implemtation as a "Map elem Int" (only positive natural numbers) or as a sorted list (possibly with duplicates!) are possible. Both implementations (the first one Daan called "MultiSet", the second one "Bag") are equivalent, if the underlying equality for the elements is strong. If the elements are only compared by an equivalence, then obviusly the second implementation could also hold several different but equivalent elements (what the Map cannot). However, putting different but equivalent elements into a set would change the Bag property as also strongly equal elements should contribute to a higher multiplicity.
Maybe I'm mistaken, maybe there's some other way Bag makes sense -- I've never really felt I needed Bag (nor MultiSet, although I've used explicit FMs with counts).
If you have a map with counts you may consider to use bags instead (and trust in a stable and efficient implementation)! HTH Christian

Christian Maeder
"Bag" and "MultiSet" are synonyms.
Okay. I was trying to look for a reasonable terminology/distinction where they weren't. If they are defined to be the same, that makes it a bit pointless.
However, putting different but equivalent elements into a set would change the Bag property as also strongly equal elements should contribute to a higher multiplicity.
Yes, of course. I meant a MultiSet.
If you have a map with counts you may consider to use bags instead (and trust in a stable and efficient implementation)!
Right. At the time, it seemed simpler to use a FM (which I know has a stable and efficient implementation, and which happened to be available). Anyway, thanks for the clarifications! -kzm -- If I haven't seen further, it is by standing in the footprints of giants

On Wed, 17 Mar 2004, Christian Maeder wrote:
Bags are something between Lists and Sets. Lists become sets by changing "++" to "union" being commutative, associative and idempotent, Lists become only Bags, when idempotency is omitted. Thus the order of elements does not matter as for Sets, but for Bags the multiplicity is important.
Ah! An Abstract Algebra...
So either an implemtation as a "Map elem Int"
or as a sorted list are possible.
Better: ordered balanced trees (easy if you one has abstract balanced trees to reuse). Robert

Am Mittwoch, 17. März 2004 13:30 schrieb Ketil Malde:
[...]
Furthermore, having only an equivalence implies that there is a stronger equality below that cannot be also an instance of Eq.
I've already said that I think Eq's (==) should be "structural equality", which I thought meant that there shouldn't be any "deeper" equality.
I think, the name "structural equality" is misleading because it seems to imply that equivalent values have to have the same (internal) structure. If values a and b have different internal representations but are indistinguishable for the library user, than a == b should hold. But I totally agree with you that == should mean equality and not anything different. And this is also what The Haskell Report says in § 6.3.1: "The Eq class provides equality (==) and inequality (/=) methods." So a lot of the MultiSet/Bag and the bias discussion can be avoided. DData can and should assume that the Eq methods mean exactly what they are supposed to mean: equality and inequality.
[...]
-kzm
Wolfgang

hi , this is an interesting discussion and i agree that in general instances of Eq should be "equality", but what do people mean by "real equality"? probably the most reasonable interpretation is some sort of observational equivalance, i.e. if two things are equal we should always be able to replace the one with the other. for a datatype that is not abstract this basically forces the equality to be structural equality, which has prolems: * if a datatype contains functions, structural equivalance is tricky * every now and than i think the most common test i want to do is an equivalance, and not equality (e.g. consider the implementation of join-lists) so for datatypes that are not abstract it seems more reasonable to expect that Eq will provide an equivalance relation rather than a real equality. for abstract datatypes one can do better as the programmer could control what observations are available to programmers. unfortunately, in Haskell it is a pain to work with those, as we cannot use pattern matching anymore and programs become clunky. so until we can think of better support for working with abstract datatypes, it seems reasonbale to me to adopt daan's original design and have different datatstructures for Bag and MultiSet (if these are commonly assumed to be the same, myabe they should have some other names). -iavor Wolfgang Jeltsch wrote:
Am Mittwoch, 17. März 2004 13:30 schrieb Ketil Malde:
[...]
Furthermore, having only an equivalence implies that there is a stronger equality below that cannot be also an instance of Eq.
I've already said that I think Eq's (==) should be "structural equality", which I thought meant that there shouldn't be any "deeper" equality.
I think, the name "structural equality" is misleading because it seems to imply that equivalent values have to have the same (internal) structure. If values a and b have different internal representations but are indistinguishable for the library user, than a == b should hold.
But I totally agree with you that == should mean equality and not anything different. And this is also what The Haskell Report says in § 6.3.1: "The Eq class provides equality (==) and inequality (/=) methods." So a lot of the MultiSet/Bag and the bias discussion can be avoided. DData can and should assume that the Eq methods mean exactly what they are supposed to mean: equality and inequality.
[...]
-kzm
Wolfgang
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Wed, 17 Mar 2004, Ketil Malde wrote:
I've already said that I think Eq's (==) should be "structural equality", which I thought meant that there shouldn't be any "deeper" equality.
Taking this serious we would also have to use "deriving Eq" and never any weaker definition of Equality. As I explain in http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/principles.html the problem can really only be understood by considering abstract data types: Equality must be so strong that all exported functions can't distinguish equal elements. What you do internally doesn't matter. As the above document explains (in the last section, by the way), this view of Abstract Data Types will also make any "biasing" in (e.g.) Set operations superfluous. In DData you say "as a concrete implementation Data.Set may specify biasing", but in the Abstract Collection Standard, we don't have that excuse any more. When looking at the standard you will see that Abstraction actually looses you much stuff, since Abstraction is always about _leaving things out_. The benefit is that the things (properties) that are left in are much more general. Of course an implementation of the Abstract Collections can strengthen the specifications (e.g. to include biasing), but giving that anyone relying on this, can't use other implementations any more, it suddenly becomes much less attractive. Surely the Gentle Introduction into Haskell with Collections must explain more of Abstract Data Types and the Laws of Programming. Robert

JP Bernardy wrote:
Now, where "isEmpty" is replaced by "null", I'ld rather see "subset" instead of "isSubsetOf", particularly also because "isProperSubsetOf" is so long. Christian
participants (6)
-
Christian Maeder
-
Iavor S. Diatchki
-
JP Bernardy
-
Ketil Malde
-
Robert Will
-
Wolfgang Jeltsch