Left-bias and non-structural equality.

On 1/2/06, Adrian Hey
On Sunday 01 Jan 2006 3:28 pm, Jean-Philippe Bernardy wrote:
Yes, union suffers from the same problem; both in Set and Map as well. The testing code can now detect problems of left-biasing in presence of non-structural equality. (That's what I wanted to implement before applying the fix).
Actually I was wondering whether or not the term "left-biasing" was particularly useful. It could be misleading for some functions.
For example.. -- | /O(n*log n)/. Create a set from a list of elements. fromList :: Ord a => [a] -> Set a fromList xs = foldlStrict ins empty xs where ins t x = insert x t
My interpretation of left biasing would be that if the input list contains multiple occurences of "equal" values then the first occurence is the one which is inserted in the Set and the rest are discarded. But AFAICS the this is not the case with the current Data.Set implementation (or with the clone I just produced). Both are right biased (at least according to my intuitive understanding of the term).
There is a "definition" of left-bias here: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Set.html -- Note that the implementation is /left-biased/ -- the elements of a -- first argument are always perferred to the second, for example in -- 'union' or 'insert'. Of course, left-biasing can only be observed -- when equality is an equivalence relation instead of structural -- equality. ...which says nothing about biasing inside a list.
BTW, I think I would prefer the default behaviour of insert to be to retain existing elements because this allows the optimisation of not updating the data structure representing the set at all (which should help keep heap burn rate down). But I guess this is in effect assuming equality implies structural equality so others may not like it. Sigh...
The biggest problem with this is that Map.insert changes the value corresponding to the inserted key inside the map, and since it is observable even with structural equality, you can be certain that quite some people rely on that behaviour. Making Set do the opposite is rather counter-intuitive. In summary, left-biasing is important even in with well-behaved equality, in the case of Maps.
<whine> I really wish this whole issue of how to properly deal with "things that are equal but may still be different somehow" would just go away :-) It's hard to be precise about what choice you've made, and whatever choice you do make usually seems arbitrary. Even when it doesn't make any semantic difference, the choice you make may well have an observable effect on space and time behavior. </whine>
I really wish it too. BTW, the fact that Map and Sets had many problems with non-structural equality and biasing is quite revealing that this is not such a big problem in practice. (For the record I had to fix union and intersection in Set, and union*, intersection*, differenceWith* and insertWith* in Map) Non structural equality seemed not to have many proponents the last time I discussed it. (See http://www.haskell.org//pipermail/libraries/2004-March/001833.html and following messages) Yet, since haskell doesn't rule it out, I guess we have to provide a consitent behaviour, if only to minimize the user's suprise. On the other hand, I wish to deprecate all functions that implicitly promote non-structural Equality, in order not to lead the unsuspecting user to a dangerous path. Interestingly enough, this brings us back to the origin of this thread, namely insertWithKey. Changing it to follow the left-bias rule, it now doesn't depend on the keys present in the map; and hence becomes deprecated, for the reason David mentioned. Cheers, JP.

On Monday 02 Jan 2006 7:29 pm, Jean-Philippe Bernardy wrote:
Non structural equality seemed not to have many proponents the last time I discussed it. (See http://www.haskell.org//pipermail/libraries/2004-March/001833.html and following messages) Yet, since haskell doesn't rule it out, I guess we have to provide a consitent behaviour, if only to minimize the user's suprise.
Well I think on the whole I agree with the views expressed by Cale (and Robert Will on the above mentioned thread). Insisting on structural equality seems too restrictive. If we insist on this then presumably Sets can't (sensibly) be instances of Eq or Ord, so we couldn't have Sets of Sets. (Hmm.. Sets of Sets.. seems like tries would be the way to go here. I guess we do probably do need ListSet and ListMap..) Anyway, the problem is we have one hole and two "equal" things, only one of which can fit in the hole. Is there any reason why we can't simply adopt the position that which one is chosen is unspecified and the choice is entirely at the implementors discretion? If this results in ambiguous programs then this is because the corresponding instance of Ord (and presumably Eq too) is just broken, and this is what needs to be fixed (or removed). If users can't define sane instances of Eq and Ord for their types then they'll just have to use the lower level APIs (eg. Data.Tree.AVL) and pass whatever is the appropriate combining comparison as an explicit argument. Regards -- Adrian Hey

Hello, On Monday 02 Jan 2006 11:43 pm, Adrian Hey wrote:
Anyway, the problem is we have one hole and two "equal" things, only one of which can fit in the hole. Is there any reason why we can't simply adopt the position that which one is chosen is unspecified and the choice is entirely at the implementors discretion?
Well we've had 24 hours of silence on this issue, so I assume this indicates that there is no reason we can't adopt the above position :-) So I vote we drop the left/right biasing distinction. This way lies madness (as they say:-) Instead we demand that instances of Eq and Ord are semantically sane, as should be stated in the Haskell language definition, (but isn't AFAIK for some reason). So for all instances of Ord, the semantics regarding which of two (or more) "equal" values is used should always be "Who knows? Who cares?". If there's some good reason why we should care then the corresponding type should not be an instance of Ord. So what should be done in cases like this? It's tempting to think we should add HOF versions like this.. insertUsing :: (a -> a -> COrdering a) -> a -> Set a -> Set a But as others have pointed out, this is probably the wrong thing to do as Sets should provide some kind of semantic guarantees (not sure what :-) which may be broken if we allow arbitrary "combining comparisons" to be used. I think the right thing to do is to expose a non-overloaded API for the underlying data structure implementation, but don't call it a "Set" or "Map" or whatever. Call it exactly what it is "AVL tree" or "Adams tree" or "trie" (or whatever). If we don't expose this API, this leaves users no alternative but to define *broken* instances of Ord (and maybe Eq too) if they want to make use of the data structure, and then leaves them scrutinising source code trying to figure out if biasing is the way they need it (and if not, what the heck they can do about it?). This is Bad Thing I think. Regards -- Adrian Hey

Adrian Hey wrote:
So I vote we drop the left/right biasing distinction. This way lies madness (as they say:-)
I would not like to drop biasing distinction, as I don't think this costs too much. With biasing, maps could be implemented via "biased sets" (Set (MapEntry a b)): data MapEntry a b = a := b instance Eq/Ord a => Eq/Ord (MapEntry a b) where compare (a1 := _) (a2 := _) = compare a1 a2
If there's some good reason why we should care then the corresponding type should not be an instance of Ord. So what should be done in cases like this? It's tempting to think we should add HOF versions like this..
insertUsing :: (a -> a -> COrdering a) -> a -> Set a -> Set a
It may also be possible to pass an order to "empty" and use it further on. The major problem is if varying orders are used ie. for "union". By good, bad or earlier design Data.Set/Map relies on Ord instances and I'm quite content with it. Cheers Christian

On 1/4/06, Christian Maeder
Adrian Hey wrote:
So I vote we drop the left/right biasing distinction. This way lies madness (as they say:-)
I would not like to drop biasing distinction, as I don't think this costs too much. With biasing, maps could be implemented via "biased sets" (Set (MapEntry a b)):
data MapEntry a b = a := b
instance Eq/Ord a => Eq/Ord (MapEntry a b) where compare (a1 := _) (a2 := _) = compare a1 a2
I would discourage such a use. This is exaclty why we provide Maps, after all.
If there's some good reason why we should care then the corresponding type should not be an instance of Ord. So what should be done in cases like this? It's tempting to think we should add HOF versions like this..
insertUsing :: (a -> a -> COrdering a) -> a -> Set a -> Set a
It may also be possible to pass an order to "empty" and use it further on. The major problem is if varying orders are used ie. for "union".
By good, bad or earlier design Data.Set/Map relies on Ord instances and I'm quite content with it.
A "good" solution would require dependent types or its type-classes emulation, but I think this is overkill. Cheers, JP.

Jean-Philippe Bernardy wrote:
data MapEntry a b = a := b
instance Eq/Ord a => Eq/Ord (MapEntry a b) where compare (a1 := _) (a2 := _) = compare a1 a2
I would discourage such a use. This is exaclty why we provide Maps, after all.
Agreed, but suppose you have Symbols with their positions. Then you may like that a set of symbols is printed with the positions last (or first) inserted, although positions are ignored in camparisons. C.

Christian Maeder wrote:
Agreed, but suppose you have Symbols with their positions. Then you may like that a set of symbols is printed with the positions last (or first) inserted, although positions are ignored in camparisons.
C.
It seems that there are two uses of Set e: 1) the Ord-comparison of e's is a total ordering / Eq e implements semantic equality; 2) Eq e is some equivalence relation, such as in Christian's example above. I agree with Jean-Philippe that 2) must not be advocated: users should always use a Map instead (e.g. Map Symbol Position, with 'real equality' on Symbol). Now, it may save effort to *implement* Map by doing exactly 2), with e = MapElement a b. The work needed to reimplement Map would have to be compared to the work needed to ensure that Set supports use 2). However, this is purely a consideration of implementation, and does not effect the user experience of Set and Map - provided that she only uses Set as per 1). Greetings, Arie

Arie Peterson wrote:
It seems that there are two uses of Set e:
1) the Ord-comparison of e's is a total ordering / Eq e implements semantic equality;
2) Eq e is some equivalence relation, such as in Christian's example above.
This is more often the case than one might like. Even for the type Set itself Eq/Ord can be observed differently by the debugging function showTree.
I agree with Jean-Philippe that 2) must not be advocated:
ok
users should always use a Map instead (e.g. Map Symbol Position, with 'real equality' on Symbol).
if I have: data Symbol = Symbol String Position instance Eq/Ord Symbol -- compare String only I'd rather like "Map Symbol Position" than "Map String Position" if I want to be more explicit about positions.
Now, it may save effort to *implement* Map by doing exactly 2), with e = MapElement a b. The work needed to reimplement Map would have to be compared to the work needed to ensure that Set supports use 2).
However, this is purely a consideration of implementation, and does not effect the user experience of Set and Map - provided that she only uses Set as per 1).
Controlled bias is simply an additional benefit of the Data.Set library implementation that you must not and are not encouraged to use (but that may be helpful if you know what you are doing), C.

Christian Maeder wrote:
2) Eq e is some equivalence relation, such as in Christian's example above.
This is more often the case than one might like. Even for the type Set itself Eq/Ord can be observed differently by the debugging function showTree.
Yes. Let me refine where I picture the border of Good Use of Eq. Eq should implement *conceptual* (semantic) equality. Haskell objects are often not in one-to-one correspondence with the conceptual object they represent: there may be many trees that represent the same set. When you're defining a set, you consider those trees equal and that should be reflected in your Eq instance.
users should always use a Map instead (e.g. Map Symbol Position, with 'real equality' on Symbol).
if I have:
data Symbol = Symbol String Position instance Eq/Ord Symbol -- compare String only
I'd rather like "Map Symbol Position" than "Map String Position" if I want to be more explicit about positions.
In this case I would go for something like ] data Symbol = Symbol String ] instance Eq/Ord Symbol so you *can* say "Map Symbol Position". IMO this better reflects what a symbol 'is'. One might want to add ] type Occurrence = (Symbol,Position) or even ] type Occurrence = MapElement Symbol Position
Controlled bias is simply an additional benefit of the Data.Set library implementation that you must not and are not encouraged to use (but that may be helpful if you know what you are doing),
Agreed. Greetings, Arie

On Wednesday 04 Jan 2006 5:37 pm, Christian Maeder wrote:
Arie Peterson wrote:
It seems that there are two uses of Set e:
1) the Ord-comparison of e's is a total ordering / Eq e implements semantic equality;
2) Eq e is some equivalence relation, such as in Christian's example above.
This is more often the case than one might like. Even for the type Set itself Eq/Ord can be observed differently by the debugging function showTree.
Well here's my take on this. If we have a module that exports a type (that is an instance of Eq/Ord) and associated functions then either.. The type is a concrete data type, in which case we pretty much have to go for structural equality. or.. The type is an abstract type, in which case we are free to use a more relaxed but semantically useful definition of "equality". But we can't do this AND have the API contain functions which allow users to discriminate between "equal" values. This is just broken (IMO). Regards -- Adrian Hey

Jean-Philippe Bernardy wrote:
On 1/4/06, Christian Maeder
wrote: I would not like to drop biasing distinction, as I don't think this costs too much. With biasing, maps could be implemented via "biased sets" (Set (MapEntry a b)):
data MapEntry a b = a := b
instance Eq/Ord a => Eq/Ord (MapEntry a b) where compare (a1 := _) (a2 := _) = compare a1 a2
I would discourage such a use. This is exaclty why we provide Maps, after all.
I still agree, but it might be nice to have for testing, comparison, and property specification purposes. Christian

On 1/5/06, Christian Maeder
Jean-Philippe Bernardy wrote:
On 1/4/06, Christian Maeder
wrote: I would not like to drop biasing distinction, as I don't think this costs too much. With biasing, maps could be implemented via "biased sets" (Set (MapEntry a b)):
data MapEntry a b = a := b
instance Eq/Ord a => Eq/Ord (MapEntry a b) where compare (a1 := _) (a2 := _) = compare a1 a2
I would discourage such a use. This is exaclty why we provide Maps, after all.
I still agree, but it might be nice to have for testing, comparison, and property specification purposes.
Christian
Indeed, that's what I did to write the left-bias testing code. As it is, we can choose to enable or disable ill-defined Eq it when testing any given specific collection data-type. Cheers, JP.

On Thursday 05 Jan 2006 9:55 am, Jean-Philippe Bernardy wrote:
On 1/5/06, Christian Maeder
wrote: I would discourage such a use. This is exaclty why we provide Maps, after all.
I still agree, but it might be nice to have for testing, comparison, and property specification purposes.
Christian
Indeed, that's what I did to write the left-bias testing code. As it is, we can choose to enable or disable ill-defined Eq it when testing any given specific collection data-type.
So do you propose to specify biasing or not? Regards -- Adrian Hey

On Wednesday 04 Jan 2006 3:25 pm, Christian Maeder wrote:
I would not like to drop biasing distinction, as I don't think this costs too much. With biasing, maps could be implemented via "biased sets" (Set (MapEntry a b)):
data MapEntry a b = a := b
instance Eq/Ord a => Eq/Ord (MapEntry a b) where compare (a1 := _) (a2 := _) = compare a1 a2
Yikes! You've just done exactly the Bad Thing I was talking about :-). This is broken IMO. E.G. val :: MapEntry a b -> b We can now have (val x) and (val y) yielding different results even though x and y may be "equal", according to the above Ord instance. But I guess you know all this already, so I'm somewhat surprised by this suggestion. I think we should be clear about why we define classes (and instances thereof) at all. Is it because they provide some kind of semantic guarantee, or because they save lazy programmers the effort of passing arbitrary functions about as explicit arguments? I hope it's the former :-) Regards -- Adrian Hey

Hi,
On 1/4/06, Adrian Hey
Hello,
On Monday 02 Jan 2006 11:43 pm, Adrian Hey wrote:
Anyway, the problem is we have one hole and two "equal" things, only one of which can fit in the hole. Is there any reason why we can't simply adopt the position that which one is chosen is unspecified and the choice is entirely at the implementors discretion?
Well we've had 24 hours of silence on this issue, so I assume this indicates that there is no reason we can't adopt the above position :-) So I vote we drop the left/right biasing distinction. This way lies madness (as they say:-)
Ok for sets; but biasing is still a sound notion for maps, as we discussed previously.
Instead we demand that instances of Eq and Ord are semantically sane, as should be stated in the Haskell language definition, (but isn't AFAIK for some reason).
So for all instances of Ord, the semantics regarding which of two (or more) "equal" values is used should always be "Who knows? Who cares?".
Agreed. Still, I have fixed the Current Data.Map and Data.Set so they match what the documentation says.
If there's some good reason why we should care then the corresponding type should not be an instance of Ord. So what should be done in cases like this? It's tempting to think we should add HOF versions like this..
insertUsing :: (a -> a -> COrdering a) -> a -> Set a -> Set a
But as others have pointed out, this is probably the wrong thing to do as Sets should provide some kind of semantic guarantees (not sure what :-) which may be broken if we allow arbitrary "combining comparisons" to be used.
I think the right thing to do is to expose a non-overloaded API for the underlying data structure implementation, but don't call it a "Set" or "Map" or whatever. Call it exactly what it is "AVL tree" or "Adams tree" or "trie" (or whatever).
If we don't expose this API, this leaves users no alternative but to define *broken* instances of Ord (and maybe Eq too) if they want to make use of the data structure, and then leaves them scrutinising source code trying to figure out if biasing is the way they need it (and if not, what the heck they can do about it?). This is Bad Thing I think.
We can have high-level types/API for maps and sets that rely on a sound Ord instance and in return guarantees validity of the structure; and a low level api for AVL trees that allows everything. Additionally, we should provide "toAVLTree / unsafeFromAVLTree" to convert between low and high level types. This allows the best of both worlds; safety by default, and high-performance/flexibility if the user wishes. Cheers, JP.

Hello Jean-Philippe, Wednesday, January 04, 2006, 6:36:14 PM, you wrote: JPB> We can have high-level types/API for maps and sets that rely on a JPB> sound Ord instance and in return guarantees validity of the structure; JPB> and a low level api for AVL trees that allows everything. imho, it the best way. for rather complex datastructures it is best to implement in low-level module just the all operations this datastructure provides and add high-level modules, which models some abstract data types (Map, Set, Foldable and so on) over this implementation. moreover, imvho it is better to leave low-level implementation open (i.e. export all functions and internal datastructure) so that anyone can add new low-level functions to this datatype (say, serialization), or use this low-level api to implement new high-level ADTs -- Best regards, Bulat mailto:bulatz@HotPOP.com

On Wednesday 04 Jan 2006 3:36 pm, Jean-Philippe Bernardy wrote:
On 1/4/06, Adrian Hey
wrote: Well we've had 24 hours of silence on this issue, so I assume this indicates that there is no reason we can't adopt the above position :-) So I vote we drop the left/right biasing distinction. This way lies madness (as they say:-)
Ok for sets; but biasing is still a sound notion for maps, as we discussed previously.
Well I wouldn't use the term biasing at all for association values in Maps. The choice is clearly semantically significant and I guess both options should provided by the new Map API.
So for all instances of Ord, the semantics regarding which of two (or more) "equal" values is used should always be "Who knows? Who cares?".
Agreed. Still, I have fixed the Current Data.Map and Data.Set so they match what the documentation says.
Yes, though I wonder how many people actually rely on this being correct? Not many I suspect, since the implementations themselves were incorrect in several cases :-) I still think specifying this at all was a mistake and we should deprecate all these functions. Unfortunately they use a lot of good names, so maybe the entire API should be deprecated and put in separate module (Data.OldSet or something).
We can have high-level types/API for maps and sets that rely on a sound Ord instance and in return guarantees validity of the structure; and a low level api for AVL trees that allows everything.
Additionally, we should provide "toAVLTree / unsafeFromAVLTree" to convert between low and high level types. This allows the best of both worlds; safety by default, and high-performance/flexibility if the user wishes.
Yes, in the short term where we're only providing one polymorphic Set type. But I think the longer term goal is to provide some kind of type constructor class that allowed multiple possible implementations of polymorphic sets (and multiple specialised monomorphic implementations too). So we shouldn't assume the underlying data structure is always going to be AVL trees. Regards -- Adrian Hey

Adrian Hey wrote:
On Wednesday 04 Jan 2006 3:36 pm, Jean-Philippe Bernardy wrote:
Agreed. Still, I have fixed the Current Data.Map and Data.Set so they match what the documentation says.
Yes, though I wonder how many people actually rely on this being correct? Not many I suspect, since the implementations themselves were incorrect in several cases :-)
Fortunately not many people relied on biasing, but a few noticed a "wrong" bias. So fixing this is an improvement! I think, performance is hardly affected.
I still think specifying this at all was a mistake and we should deprecate all these functions. Unfortunately they use a lot of good names, so maybe the entire API should be deprecated and put in separate module (Data.OldSet or something).
I can't follow you here. Cheers Christian

On Wednesday 04 Jan 2006 10:21 pm, Christian Maeder wrote:
Adrian Hey wrote:
On Wednesday 04 Jan 2006 3:36 pm, Jean-Philippe Bernardy wrote:
Agreed. Still, I have fixed the Current Data.Map and Data.Set so they match what the documentation says.
Yes, though I wonder how many people actually rely on this being correct? Not many I suspect, since the implementations themselves were incorrect in several cases :-)
Fortunately not many people relied on biasing, but a few noticed a "wrong" bias. So fixing this is an improvement! I think, performance is hardly affected.
If they noticed it that can only be because they defined broken instances of Ord :-). Though in all fairness the current overloaded (only) API leaves little alternative (if you want access to the tree implementation). This is what we want to discourage IMO (but of course we must provide some alternative for cases like this).
I still think specifying this at all was a mistake and we should deprecate all these functions. Unfortunately they use a lot of good names, so maybe the entire API should be deprecated and put in separate module (Data.OldSet or something).
I can't follow you here.
I mean the new Set API should not make any promises regarding biasing, but I guess the old biased API still needs to be made available for those that rely on current biasing (albeit at the cost of changing imports). Or they could just make local copies of Data.Set or the AVL based clone I recently posted. Regards -- Adrian Hey

Adrian Hey
insertUsing :: (a -> a -> COrdering a) -> a -> Set a -> Set a
But as others have pointed out, this is probably the wrong thing to do as Sets should provide some kind of semantic guarantees
My gut reaction to this is that it's ugly - whether a Set use structural equality or just some possibly ill-defined equivalence, that should be a property of the Set *type*, not the insertion operator. And what if you change the equivalence while inserting? I guess it is impractical to use a phantom type to provide the ordering function? :-) How about something like: data SetEql a b = -- abstract, set of as, requires Ord b with empty :: Ord b => (a->b) -> SetEql a b (and similar for other set construction functions)? I'm still not convinced this solves any problem not solved better by a Map, though. And it still doesn't embed the comparison function in the type, so I guess you will have to define left or right bias for union, etc. (Or perhaps run both (a->b) functions on all keys to check for equality, with the alternative being a run time error?) -k -- If I haven't seen further, it is by standing in the footprints of giants

On Thursday 05 Jan 2006 8:11 pm, Ketil Malde wrote:
Adrian Hey
writes: insertUsing :: (a -> a -> COrdering a) -> a -> Set a -> Set a
But as others have pointed out, this is probably the wrong thing to do as Sets should provide some kind of semantic guarantees
My gut reaction to this is that it's ugly
Well I thought it was clear from the post you're replying to (and even from the text you've quoted) that this was just a straw man proposal, not a real proposal. As for doing this with AVL trees rather than "Sets", I have no intention of changing this because having "combining comparisons" as explicit arguments is really convenient in real programs. It wraps up user control of both strictness and biasing into a single function argument, eliminates the need for separate "With" versions of functions and allows easy mixed type operations (such as intersection). It's true that this lacks safety, because all these functions assume that tree elements are sorted according to the same criterion (informally speaking), and this is not enforced by the type system. Where there's a real problem is that assuming the underlying data structure is an AVL tree and providing O(1) conversions between Sets and AVL trees is not going to be an option for a general Set type constructor class. What's needed is some kind of abstracted type safe wrapper to deal with non-instances of Ord. Preferably one that provides the same control as Data.Tree.AVL does. So maybe something like what you suggested would work. I dunno, but if anybody wants to provide a suitably abstracted wrapper around all or part of the current Data.Tree.AVL API, that would be very useful. Regards -- Adrian Hey
participants (6)
-
Adrian Hey
-
Arie Peterson
-
Bulat Ziganshin
-
Christian Maeder
-
Jean-Philippe Bernardy
-
Ketil Malde