Does GHC compare pointers when eval'ing (==)

Comparing two structures for equality (structurally) can be expensive. But if their references are the same they would for sure be equal (unless (==) was defined in some funny way). Does GHC perform any such optimization? (Likely a question for another list but I make a stab here first. )

Hi Johan,
Haskell does not support referential equality, as that would break
referential transparency.
http://stackoverflow.com/questions/1717553/pointer-equality-in-haskell
Cheers
On Aug 20, 2014 7:52 AM, "Johan Holmquist"
Comparing two structures for equality (structurally) can be expensive. But if their references are the same they would for sure be equal (unless (==) was defined in some funny way). Does GHC perform any such optimization?
(Likely a question for another list but I make a stab here first. )
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

As I understood, the question was if GHC would first compare pointers,
and only call the Eq instance if the pointers are not equal. I guess
this would be safe, but I don't think GHC does such a thing.
Erik
On Wed, Aug 20, 2014 at 9:09 AM, Alois Cochard
Hi Johan,
Haskell does not support referential equality, as that would break referential transparency.
http://stackoverflow.com/questions/1717553/pointer-equality-in-haskell
Cheers
On Aug 20, 2014 7:52 AM, "Johan Holmquist"
wrote: Comparing two structures for equality (structurally) can be expensive. But if their references are the same they would for sure be equal (unless (==) was defined in some funny way). Does GHC perform any such optimization?
(Likely a question for another list but I make a stab here first. )
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Aug 20, 2014 at 10:23 AM, Erik Hesselink
As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal. I guess this would be safe, but I don't think GHC does such a thing.
I think the reason it isn't done is that it's not always an optimization. We do it manually in e.g. bytestring.

On Wed, Aug 20, 2014 at 11:28 AM, Johan Tibell
On Wed, Aug 20, 2014 at 10:23 AM, Erik Hesselink
wrote: As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal. I guess this would be safe, but I don't think GHC does such a thing.
I think the reason it isn't done is that it's not always an optimization. We do it manually in e.g. bytestring.
There are two cases I can think of where it would also change the semantics of the code: 1. An Eq instance that doesn't obey the reflective property (not recommended): data BadEq = BadEq instance Eq BadEq where BadEq == BadEq = False 2. Eq instances intended to avoid timing attacks, by always comparing the entire data structure. newtype SlowEq a = SlowEq [a] instance Eq a => Eq (SlowEq a) where SlowEq x == SlowEq y = slowAnd $ length x == length y : zipWith (==) x y slowAnd = loop True where loop !x [] = x loop !x (!y:ys) = loop (x && y) ys (Note: not actually tested.) It's difficult for me to imagine a case though where a timing attack could result on two data structures sharing the same pointer; usually we'd be talking about comparing two ByteStrings or Texts that come from very different sources. Michael

As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal.
Yes, spot on!
1. An Eq instance that doesn't obey the reflective property (not recommended):
And this is what I mean with "defined in some funny way". One could consider this optimization(?) only to kick in when GHC generates the instance definition by means of the deriving mechanism. That should be safe(?).
I think the reason it isn't done is that it's not always an optimization
Could you explain when it's not?
Clarification:
Consider the following:
let xs = [1..1000] in xs == xs
xs would surely be equal but generally finding out if two lists are equal
amounts to comparing each element. If the compiler was first to check if
the objects referenced where the same, it should safely be able to yield
True without further evaluation. (In this particular example the compiler
may make other optimizations but it gives the general idea.)
Ofcourse `let x = undefined in x == x` might be considered bad to optimize
to True but maybe this could be checked somehow...
2014-08-20 10:35 GMT+02:00 Michael Snoyman
On Wed, Aug 20, 2014 at 11:28 AM, Johan Tibell
wrote: On Wed, Aug 20, 2014 at 10:23 AM, Erik Hesselink
wrote: As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal. I guess this would be safe, but I don't think GHC does such a thing.
I think the reason it isn't done is that it's not always an optimization. We do it manually in e.g. bytestring.
There are two cases I can think of where it would also change the semantics of the code:
1. An Eq instance that doesn't obey the reflective property (not recommended):
data BadEq = BadEq instance Eq BadEq where BadEq == BadEq = False
2. Eq instances intended to avoid timing attacks, by always comparing the entire data structure.
newtype SlowEq a = SlowEq [a] instance Eq a => Eq (SlowEq a) where SlowEq x == SlowEq y = slowAnd $ length x == length y : zipWith (==) x y
slowAnd = loop True where loop !x [] = x loop !x (!y:ys) = loop (x && y) ys
(Note: not actually tested.) It's difficult for me to imagine a case though where a timing attack could result on two data structures sharing the same pointer; usually we'd be talking about comparing two ByteStrings or Texts that come from very different sources.
Michael
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 20 August 2014 19:05, Johan Holmquist
As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal.
Yes, spot on!
1. An Eq instance that doesn't obey the reflective property (not recommended):
And this is what I mean with "defined in some funny way". One could consider this optimization(?) only to kick in when GHC generates the instance definition by means of the deriving mechanism. That should be safe(?).
Unless deep down a value is used that _does_ have a dodgy custom derived Eq instance.
I think the reason it isn't done is that it's not always an optimization
Could you explain when it's not?
Clarification:
Consider the following:
let xs = [1..1000] in xs == xs
xs would surely be equal but generally finding out if two lists are equal amounts to comparing each element. If the compiler was first to check if the objects referenced where the same, it should safely be able to yield True without further evaluation. (In this particular example the compiler may make other optimizations but it gives the general idea.)
Ofcourse `let x = undefined in x == x` might be considered bad to optimize to True but maybe this could be checked somehow...
2014-08-20 10:35 GMT+02:00 Michael Snoyman
: On Wed, Aug 20, 2014 at 11:28 AM, Johan Tibell
wrote: On Wed, Aug 20, 2014 at 10:23 AM, Erik Hesselink
wrote: As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal. I guess this would be safe, but I don't think GHC does such a thing.
I think the reason it isn't done is that it's not always an optimization. We do it manually in e.g. bytestring.
There are two cases I can think of where it would also change the semantics of the code:
1. An Eq instance that doesn't obey the reflective property (not recommended):
data BadEq = BadEq instance Eq BadEq where BadEq == BadEq = False
2. Eq instances intended to avoid timing attacks, by always comparing the entire data structure.
newtype SlowEq a = SlowEq [a] instance Eq a => Eq (SlowEq a) where SlowEq x == SlowEq y = slowAnd $ length x == length y : zipWith (==) x y
slowAnd = loop True where loop !x [] = x loop !x (!y:ys) = loop (x && y) ys
(Note: not actually tested.) It's difficult for me to imagine a case though where a timing attack could result on two data structures sharing the same pointer; usually we'd be talking about comparing two ByteStrings or Texts that come from very different sources.
Michael
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

On Wed, Aug 20, 2014 at 11:05 AM, Johan Holmquist
I think the reason it isn't done is that it's not always an optimization
Could you explain when it's not?
A simple example is when the two values aren't equal, as you'll be doing one more branch than you otherwise would. The extra branch might also hurt the branch predictor even in the equal case, if the branch is hard to predict.

On Wed, Aug 20, 2014 at 11:20 AM, Johan Tibell
On Wed, Aug 20, 2014 at 11:05 AM, Johan Holmquist
wrote: I think the reason it isn't done is that it's not always an optimization
Could you explain when it's not?
A simple example is when the two values aren't equal, as you'll be doing one more branch than you otherwise would.
True
The extra branch might also hurt the branch predictor even in the equal case, if the branch is hard to predict.
This cost I think we can completely eliminate. If the pointer comparison is done only for an Eq instance that is statically known to execute N instructions, then as as long as the cost of all subsequent branch mispredictions are less than N, then we win, always. Alexander
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Aug 20, 2014 at 2:17 PM, Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
On Wed, Aug 20, 2014 at 11:20 AM, Johan Tibell
wrote: The extra branch might also hurt the branch predictor even in the equal case, if the branch is hard to predict.
This cost I think we can completely eliminate. If the pointer comparison is done only for an Eq instance that is statically known to execute N instructions, then as as long as the cost of all subsequent branch mispredictions are less than N, then we win, always.
The cost for branch prediction is typically known (per CPU model), but it's not clear to me that we can know how much cost we're adding vs removing. For example, consider Eq on pairs (a, b), in a use case where * 'a' rarely varies (and hence Eq on 'a' is easy to predict) and * 'b' is essentially random (and thus hard to predict). In that case the code generated for ptr@(a, b) == ptr2(a2, b2) if a == a2 -- we will predict correctly most of the time. then if b == b2 then ... else ... else ... -- this case is cheap is easy to predict (because 'a' is easy to predict). However the code for the pointer-based Eq is not if ptr == ptr2 -- we will predict this incorrectly most of the time, due to the whole pair rarely being equal. then if a == a2 then if b == b2 then ... else ... else ... else ... However, more importantly adding extra branches can often get in the way of other low-level optimizations.

There are two cases I can think of where it would also change the semantics of the code:
1. An Eq instance that doesn't obey the reflective property (not recommended):
data BadEq = BadEq instance Eq BadEq where BadEq == BadEq = False
2. Eq instances intended to avoid timing attacks, by always comparing the entire data structure.
newtype SlowEq a = SlowEq [a] instance Eq a => Eq (SlowEq a) where SlowEq x == SlowEq y = slowAnd $ length x == length y : zipWith (==) x y
slowAnd = loop True where loop !x [] = x loop !x (!y:ys) = loop (x && y) ys
Third case that would change the semantics: 3. Non-terminating evaluation of a value: let x = [1..] in x == x

On Wed, Aug 20, 2014 at 10:35 AM, Michael Snoyman
On Wed, Aug 20, 2014 at 11:28 AM, Johan Tibell
wrote: On Wed, Aug 20, 2014 at 10:23 AM, Erik Hesselink
wrote: As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal. I guess this would be safe, but I don't think GHC does such a thing.
I think the reason it isn't done is that it's not always an optimization. We do it manually in e.g. bytestring.
There are two cases I can think of where it would also change the semantics of the code:
1. An Eq instance that doesn't obey the reflective property (not recommended):
data BadEq = BadEq instance Eq BadEq where BadEq == BadEq = False
I think this is just a buggy instance, and if you do this, nothing is guaranteed. Many functions with an Eq constraint will also not work. Interestingly, reflexivity is not a law listed in the haddock documentation. Erik

On Wed, Aug 20, 2014 at 12:15 PM, Erik Hesselink
On Wed, Aug 20, 2014 at 10:35 AM, Michael Snoyman
wrote: On Wed, Aug 20, 2014 at 11:28 AM, Johan Tibell
wrote: On Wed, Aug 20, 2014 at 10:23 AM, Erik Hesselink
wrote: As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal. I guess this would be safe, but I don't think GHC does such a thing.
I think the reason it isn't done is that it's not always an
optimization.
We do it manually in e.g. bytestring.
There are two cases I can think of where it would also change the semantics of the code:
1. An Eq instance that doesn't obey the reflective property (not recommended):
data BadEq = BadEq instance Eq BadEq where BadEq == BadEq = False
I think this is just a buggy instance, and if you do this, nothing is guaranteed. Many functions with an Eq constraint will also not work. Interestingly, reflexivity is not a law listed in the haddock documentation.
Erik
I agree that this is terrible practice, and I hope no one is actually doing this. It's unfortunate that reflexivity isn't listed in the Haddocks. Perhaps we should start a separate proposal for making reflexivity an officially required law of the Eq class. Michael

On Wed, Aug 20, 2014 at 9:20 PM, Michael Snoyman
On Wed, Aug 20, 2014 at 12:15 PM, Erik Hesselink
wrote: On Wed, Aug 20, 2014 at 10:35 AM, Michael Snoyman
wrote: On Wed, Aug 20, 2014 at 11:28 AM, Johan Tibell
wrote: On Wed, Aug 20, 2014 at 10:23 AM, Erik Hesselink
wrote: As I understood, the question was if GHC would first compare pointers, and only call the Eq instance if the pointers are not equal. I guess this would be safe, but I don't think GHC does such a thing.
I think the reason it isn't done is that it's not always an optimization. We do it manually in e.g. bytestring.
There are two cases I can think of where it would also change the semantics of the code:
1. An Eq instance that doesn't obey the reflective property (not recommended):
data BadEq = BadEq instance Eq BadEq where BadEq == BadEq = False
I think this is just a buggy instance, and if you do this, nothing is guaranteed. Many functions with an Eq constraint will also not work. Interestingly, reflexivity is not a law listed in the haddock documentation.
Erik
I agree that this is terrible practice, and I hope no one is actually doing this. It's unfortunate that reflexivity isn't listed in the Haddocks. Perhaps we should start a separate proposal for making reflexivity an officially required law of the Eq class.
Michael
The problem with requiring reflexivity (or *any* law related to Eq/Ord, for that matter) is that Double and Float break it. Observe: ghci> let nan = 0.0 / 0.0 in nan == nan False Rust[1] side-steps this issue by having separate Eq (lawful) and PartialEq (unlawful) classes, but it's a pretty intrusive change. Chris [1] http://doc.rust-lang.org/core/cmp/trait.PartialEq.html

Doing so would break the ability to optimize the language.
(5,_|_) == (5,_|_) should equal _|_, not true. even if they happened
to be represented by the same pointer. defining pointer equality would
enforce a specific evaluation model, which haskell specificially does
not do. doing so would kill most other optimizations and lead to a
very sloooow language as higher order transformations such as
deforestation are key to efficient haskell implementations no longer
be valid. It's unintuitive, but sometimes adding things that look like
local optimizations like this, actually hurt performance, a lot
because the higher order optimizations now have to avoid any situation
where they accidentally introduce a situation where this peephole
optimization exists.
I will admit, until I implemented a haskell compiler I was skeptical
too. but it is really the case. one step forward is sometimes ten
steps back.
John
On Tue, Aug 19, 2014 at 11:52 PM, Johan Holmquist
Comparing two structures for equality (structurally) can be expensive. But if their references are the same they would for sure be equal (unless (==) was defined in some funny way). Does GHC perform any such optimization?
(Likely a question for another list but I make a stab here first. )
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- John Meacham - http://notanumber.net/
participants (10)
-
Alexander Kjeldaas
-
Alois Cochard
-
Chris Wong
-
Erik Hesselink
-
Ivan Lazar Miljenovic
-
Johan Holmquist
-
Johan Tibell
-
John Meacham
-
Michael Snoyman
-
Rudy Matela