Re: [Haskell-cafe] Does GHC compare pointers when eval'ing (==)

To summarise, the points raised sofar against this optimisation are,
roughly, as follows.
* More expensive to compare two values which aren't equal.
I think the potential gain of the optimisation might outweigh the cost in
the general case.
(Also, the optimisation would not have to be triggered for simple values
like Char, Int and Double.)
* Dodgy Eq instances may break the optimisation
But may also break other stuff and should be avoided anyway.
* _|_ will not necessarily yield _|_ on comparison
This seems the biggest problem sofar to me, that (5,_|_) == (5,_|_) would
be _|_ if both pairs where at different memory locations and True
otherwise. But perhaps not a big deal in practice.
* Disrupts other compiler optimisation opportunities
My knowledge about compilers is lacking so I cannot comment this.
Interesting that this was implemented by hbc. Would be nice to know if it
led to problems.
2014-08-20 14:32 GMT+02:00 Jan-Willem Maessen
On Wed, Aug 20, 2014 at 2:52 AM, Johan Holmquist
wrote: Comparing two structures for equality (structurally) can be expensive. But if their references are the same they would for sure be equal (unless (==) was defined in some funny way). Does GHC perform any such optimization?
The big problem in practice are the Float / Double instances in the presence of NaN, as noted by other folks.
Back in the day, hbc had a flag that would do a pointer equality check in the derived Eq instances it generated before falling back to a traversal. Interestingly, it does actually change the way you think about programming when you're working with multiple values that are very likely to share sub-structure. Suddenly it seems OK to do equality checks as you traverse (because most of them will succeed), which would be O(n^2) if you weren't doing the pointer test.
-Jan-Willem Maessen
(Likely a question for another list but I make a stab here first. )
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Aug 20, 2014 at 03:16:03PM +0200, Johan Holmquist wrote:
This seems the biggest problem sofar to me, that (5,_|_) == (5,_|_) would be _|_ if both pairs where at different memory locations and True otherwise. But perhaps not a big deal in practice.
This sounds like a big deal. f () = (5, undefined) -- crashes -- (common subexpression elimination optimization my lead to True) let x = f () y = f () in x == y -- True let x = f () y = x in x == y This isn't the sort of behaviour I'd like from my programming language. Tom

On Wed, Aug 20, 2014 at 03:16:03PM +0200, Johan Holmquist wrote:
To summarise, the points raised sofar against this optimisation are, roughly, as follows.
I think a different aspect of this is: what are you trying to optimise? What code do you have where equality tests are a significant part of the workload? regards, iustin

I wanted this too when I first got into Haskell, but ultimately it comes
down to not being semantically correct here.
NaN /= NaN, so referential equality of anything that might ever compare
inside two NaN's doesn't imply value equality.
e.g. Anything with a polymorphic field can't use this.
You can argue that that was a bad call, but it was the expected call under
IEEE semantics.
If you wanted to bubble up some kind of extra 'reflexive :: a -> Bool'
through the Eq instance, then the tricky part is that determining if you
can use this can be as bad as doing the full tree comparison for
pathological cases you can construct and can introduce bottoms where there
previously were none if you do it naively.
You also start getting weird cases where things turn into Heisenbugs. this
== that would terminate before you evaluate this or that, but if you
evaluate one or the other then sometimes it works.
-Edward
On Wed, Aug 20, 2014 at 11:14 AM, Iustin Pop
On Wed, Aug 20, 2014 at 03:16:03PM +0200, Johan Holmquist wrote:
To summarise, the points raised sofar against this optimisation are, roughly, as follows.
I think a different aspect of this is: what are you trying to optimise? What code do you have where equality tests are a significant part of the workload?
regards, iustin _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I think a different aspect of this is: what are you trying to optimise?
Well, what led me to ask about this was not a particular problem but more the thought that such an optimisation might perhaps be possible and maybe already implemented. And if not, why. If such an optimisation was not to ruin things it could be nice to have. One simple example is the list. The time it takes to establish that two lists are equal is linear to the number of elements. To establish if they are not equal requires only to find one differing element. When dealing with shared sub structures (sub lists in this case) this could matter a lot. A nice effect is that we could then even establish equality for infinite lists when they are the same.
I wanted this too when I first got into Haskell, but ultimately it comes down to not being semantically correct here.
Except for the floating points it would appear that most problems are
caused by _|_ some way or another.
2014-08-20 17:48 GMT+02:00 Edward Kmett
I wanted this too when I first got into Haskell, but ultimately it comes down to not being semantically correct here.
NaN /= NaN, so referential equality of anything that might ever compare inside two NaN's doesn't imply value equality.
e.g. Anything with a polymorphic field can't use this.
You can argue that that was a bad call, but it was the expected call under IEEE semantics.
If you wanted to bubble up some kind of extra 'reflexive :: a -> Bool' through the Eq instance, then the tricky part is that determining if you can use this can be as bad as doing the full tree comparison for pathological cases you can construct and can introduce bottoms where there previously were none if you do it naively.
You also start getting weird cases where things turn into Heisenbugs. this == that would terminate before you evaluate this or that, but if you evaluate one or the other then sometimes it works.
-Edward
On Wed, Aug 20, 2014 at 11:14 AM, Iustin Pop
wrote: On Wed, Aug 20, 2014 at 03:16:03PM +0200, Johan Holmquist wrote:
To summarise, the points raised sofar against this optimisation are, roughly, as follows.
I think a different aspect of this is: what are you trying to optimise? What code do you have where equality tests are a significant part of the workload?
regards, iustin _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 21/08/2014, at 3:48 AM, Edward Kmett wrote:
I wanted this too when I first got into Haskell, but ultimately it comes down to not being semantically correct here.
NaN /= NaN, so referential equality of anything that might ever compare inside two NaN's doesn't imply value equality.
e.g. Anything with a polymorphic field can't use this.
You can argue that that was a bad call, but it was the expected call under IEEE semantics.
If I recall correctly, Prolog handles this by saying that there is 'unification' and 'arithmetic comparison' and NaN = NaN is true but NaN =:= NaN is false. One way to handle this in a revised prelude would be to have class MaybeEq a where equalIfPossible :: a -> a -> Maybe Bool class (MaybeEq a) => MaybeOrd a where compareIfPossible :: a -> a -> Maybe Order and then a bunch of IEEE-like operations, e.g., x =? y if x = y or x,y are not comparable x =! y if x = y instance Eq t => MaybeEq t where equalIfPossible x y = Just (x == y) instance Ord t => MaybeOrd t where compareIfPossible x y = Just (compare x y) and to move Double and Float to MaybeEq and MaybeOrd instead of Eq and Ord, and support deriving (MaybeEq,MaybeOrd) in the obvious way. Then there could be a debate about whether to allow instance Eq Double where x == y = if isNaN x then isNaN y else case equalIfPossible x y of Just True -> True _ -> False or not.

What you are discussing is "observable sharing". It breaks things
quite dramatically. Among other things, rather important optimizations
such as inlining are no longer valid if you allow it and reasoning
about efficiency goes right out the window.
It is possible to do safely without breaking things when done in the IO Monad,
import System.Mem.StableName
sameThing :: a -> a -> IO a
sameThing x y = liftM2 (==) (makeStableName x) (makeStableName y)
But whether it does what you expect after optimizations is
questionable and would likely behave differently under different
compilers.
John
On Wed, Aug 20, 2014 at 5:55 PM, Richard A. O'Keefe
On 21/08/2014, at 3:48 AM, Edward Kmett wrote:
I wanted this too when I first got into Haskell, but ultimately it comes down to not being semantically correct here.
NaN /= NaN, so referential equality of anything that might ever compare inside two NaN's doesn't imply value equality.
e.g. Anything with a polymorphic field can't use this.
You can argue that that was a bad call, but it was the expected call under IEEE semantics.
If I recall correctly, Prolog handles this by saying that there is 'unification' and 'arithmetic comparison' and NaN = NaN is true but NaN =:= NaN is false.
One way to handle this in a revised prelude would be to have class MaybeEq a where equalIfPossible :: a -> a -> Maybe Bool class (MaybeEq a) => MaybeOrd a where compareIfPossible :: a -> a -> Maybe Order and then a bunch of IEEE-like operations, e.g., x =? y if x = y or x,y are not comparable x =! y if x = y
instance Eq t => MaybeEq t where equalIfPossible x y = Just (x == y) instance Ord t => MaybeOrd t where compareIfPossible x y = Just (compare x y)
and to move Double and Float to MaybeEq and MaybeOrd instead of Eq and Ord, and support deriving (MaybeEq,MaybeOrd) in the obvious way.
Then there could be a debate about whether to allow instance Eq Double where x == y = if isNaN x then isNaN y else case equalIfPossible x y of Just True -> True _ -> False or not.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- John Meacham - http://notanumber.net/

On Wed, Aug 20, 2014 at 06:19:26PM -0700, John Meacham wrote:
It is possible to do safely without breaking things when done in the IO Monad,
import System.Mem.StableName sameThing :: a -> a -> IO a sameThing x y = liftM2 (==) (makeStableName x) (makeStableName y)
I don't think this has the behaviour that Johan (the OP) wants. import System.Mem.StableName import Control.Monad sameThing :: a -> a -> IO Bool sameThing x y = liftM2 (==) (makeStableName x) (makeStableName y) main = do let x = 1 y = x print =<< sameThing x y print =<< sameThing x x GHCi> main False True I suspect Johan wants "True; True". Tom
participants (6)
-
Edward Kmett
-
Iustin Pop
-
Johan Holmquist
-
John Meacham
-
Richard A. O'Keefe
-
Tom Ellis