
Hi, Is there any way of getting the following code to immediately return True without performing the element-by-element comparison? Essentially this boils down to checking whether pointers are equal before comparing the contents.
main = print $ f == f where f = [1..10^9]
Thanks!! nikhil

On Tue, Jul 19, 2011 at 23:51, Nikhil A. Patil
Is there any way of getting the following code to immediately return True without performing the element-by-element comparison? Essentially this boils down to checking whether pointers are equal before comparing the contents.
Let's pt it this way: there's a hidden primitive called "reallyUnsafePointerEquality". It's named that for a reason. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

reallyUnsafePointerEq#, and it really is as unsafe as it sounds :)
20.07.2011, в 7:51, "Nikhil A. Patil"
Hi,
Is there any way of getting the following code to immediately return True without performing the element-by-element comparison? Essentially this boils down to checking whether pointers are equal before comparing the contents.
main = print $ f == f where f = [1..10^9]
Thanks!!
nikhil
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

2011/7/20 Eugene Kirpichov
reallyUnsafePointerEq#, and it really is as unsafe as it sounds :)
Why is it so unsafe? i can't find any documentation on it. I think always compare pointer first is a good optimization.
20.07.2011, в 7:51, "Nikhil A. Patil"
написал(а): Hi,
Is there any way of getting the following code to immediately return True without performing the element-by-element comparison? Essentially this boils down to checking whether pointers are equal before comparing the contents.
main = print $ f == f where f = [1..10^9]
Thanks!!
nikhil
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Jul 19, 2011 at 11:14 PM, yi huang
2011/7/20 Eugene Kirpichov
reallyUnsafePointerEq#, and it really is as unsafe as it sounds :)
Why is it so unsafe? i can't find any documentation on it. I think always compare pointer first is a good optimization.
False positives and false negatives are both possible, depending on GC timing. Don't use it, unless you know why it can result in both false positives and false negatives, and you know why neither of those are bad for your use case. I'm not aware of any use case that's resilient to both failure modes, offhand. Carl

Carl Howells wrote:
On Tue, Jul 19, 2011 at 11:14 PM, yi huang
wrote: 2011/7/20 Eugene Kirpichov
reallyUnsafePointerEq#, and it really is as unsafe as it sounds :)
Why is it so unsafe? i can't find any documentation on it. I think always compare pointer first is a good optimization.
False positives and false negatives are both possible, depending on GC timing.
At the moment, as implemented in ghc, false positives are not possible, because GC only happens on allocation [*], and there is no allocation happening in that primitive operation. I don't think this is going to change without a total rewrite of ghc, since allowing GC (i.e., moving pointers) at arbitrary times would be a fundamental change to the STG execution model. Pretty much everything else imaginable can happen; in particular, if two variables a and b compared equal at one point, they may later become different pointers again. In the parallel RTS, if you're unlucky, this may even be a permanent effect. Best regards, Bertram [*] we'll have thread-local GC for the first generation soon, but a lot of effort went into ensuring consistentcy of pointers seen by other threads.

Hello all, I'm a newbie at Haskell and I was not aware of this problem. So, equality comparison can run into an infinite-loop? My current knowledge of the language tells me that everything is Haskell is a thunk until it's value is really needed. Is it possible to implement (==) that first check these thunks before evaluating it? (Considering both arguments has pure types). E.g., Equivalent thunks, evaluates to True, does not need to evaluate its arguments: [1..] == [1..] Another case: fib = 1:1:zipWith (+) fib (tail fib) fibA = 1:tail fib fib == fibA -- True Evaluating: 1:1:zipWith (+) fib (tail fib) == 1:tail fib -- first item match, check further 1:zipWith (+) fib (tail fib) == tail fib -- thunks do not match, evaluate arguments 1:zipWith (+) fib (tail fib) == 1:zipWith (+) fib (tail fib) -- thunks matches, comparison stops and the value is True As I said before, I'm a newbie at Haskell. Sorry if my question or examples makes no sense. Thanks, Thiago.

Quoting Thiago Negri
Hello all, I'm a newbie at Haskell and I was not aware of this problem. So, equality comparison can run into an infinite-loop?
Yes, comparing infinite lists is a non-terminating computation.
My current knowledge of the language tells me that everything is Haskell is a thunk until it's value is really needed. Is it possible to implement (==) that first check these thunks before evaluating it? (Considering both arguments has pure types).
You're correct in your perception of thunks, however that doesn't help. That would pretty much only be possible in the very simplest of cases. Consider the following:
let a = [1..] b = [1..] in a == b
It's a lot harder to compare independently generated thunks. And this complexity is pretty much unbounded:
foo = [10-9..] let b = [1..] in b == foo
etc.. Instead, if you know your input is unbounded, you need to decide how much of it it is reasonable to process, regardless of the language.
take 10000000 [1..] == take 10000000 [1..]
-KQ ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/

On Wed, Jul 20, 2011 at 10:48 AM, Thiago Negri
Hello all, I'm a newbie at Haskell and I was not aware of this problem. So, equality comparison can run into an infinite-loop?
My current knowledge of the language tells me that everything is Haskell is a thunk until it's value is really needed. Is it possible to implement (==) that first check these thunks before evaluating it? (Considering both arguments has pure types).
One thing to remember is that (==) is an ordinary Haskell function - it isn't a special built-in operator. Every type that implements implements it as an ordinary Haskell function. There is support to have the compiler right the function for you in some cases, but I don't believe it uses any facilities unavailable to an ordinary app/library author. The notion of "thunks" is a deep implementation detail - there's no mandate that the concept be used in a Haskell implementation, so the language doesn't expose the concept in a concrete way. I think that some folks have released libraries that call into internal GHC hooks to see if a run-time value points to an unevaluated thunk.
E.g.,
Equivalent thunks, evaluates to True, does not need to evaluate its arguments: [1..] == [1..]
How would we measure "equivalence" on thunks? We would need to open up the thunk, inspect the pieces, and then somehow call the appropriate (==) method on the different pieces - and then some of the pieces themselves might be thunks on the LHS but not the RHS, and the thunks on the LHS and the RHS might be equivalent but of completely different nature: think of:
f x == g y
Where f & g are completely different functions. Both might evaluate to [1..] through independent means, the only way to know would be to evaluate them, which might not terminate.
Another case:
fib = 1:1:zipWith (+) fib (tail fib) fibA = 1:tail fib fib == fibA -- True
Evaluating:
1:1:zipWith (+) fib (tail fib) == 1:tail fib -- first item match, check further 1:zipWith (+) fib (tail fib) == tail fib -- thunks do not match, evaluate arguments 1:zipWith (+) fib (tail fib) == 1:zipWith (+) fib (tail fib) -- thunks matches, comparison stops and the value is True
As I said before, I'm a newbie at Haskell. Sorry if my question or examples makes no sense.
One other thing to note is that GHC strips away almost all types during the process of compilation, so a lot of advanced reflection needs to be done with structures that explicitly keep the type around during run-time,
Thanks, Thiago.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, 20 Jul 2011 12:48:48 -0300
Thiago Negri
Is it possible to implement (==) that first check these thunks before evaluating it? (Considering both arguments has pure types).
E.g.,
Equivalent thunks, evaluates to True, does not need to evaluate its arguments: [1..] == [1..]
Thunks are just expressions and equality of expressions is undecidable in any Turing-complete language (like any general-purpose programming language). Note that syntactical equality is not sufficient because (==) should be referentially transparent. Pedro

On 07/21/2011 10:30 AM, Pedro Vasconcelos wrote:
On Wed, 20 Jul 2011 12:48:48 -0300 Thiago Negri
wrote: Is it possible to implement (==) that first check these thunks before evaluating it? (Considering both arguments has pure types).
E.g.,
Equivalent thunks, evaluates to True, does not need to evaluate its arguments: [1..] == [1..]
Thunks are just expressions and equality of expressions is undecidable in any Turing-complete language (like any general-purpose programming language). Note that syntactical equality is not sufficient because (==) should be referentially transparent.
I think the following code pretty much models what Thiago meant for a small subset of haskell that constructs possibly infinite lists. Thunks are made explicit as syntax trees. 'Cycle' is the syntactic symbol for a function whose definition is given by the respective case in the definition of 'evalOne'. (I chose cycle here instead of the evalFrom example above to because it doesn't need an Enum constraint). The interesting part is the definition of 'smartEq'. import Data.List (unfoldr) import Data.Function (on) -- let's say we have syntactic primitives like this data ListExp a = Nil | Cons a (ListExp a) | Cycle (ListExp a) deriving (Eq, Ord, Read, Show) -- derives syntactic equality conss :: [a] -> ListExp a -> ListExp a conss xs exp = foldr Cons exp xs fromList :: [a] -> ListExp a fromList xs = conss xs Nil -- eval the next element, return an expression defining the tail -- (if non-empty) evalOne :: ListExp a -> Maybe (a, ListExp a) evalOne Nil = Nothing evalOne (Cons h t) = Just (h, t) evalOne e@(Cycle exp) = case eval exp of [] -> Nothing (x:xs) -> Just (x, conss xs e) eval :: ListExp a -> [a] eval = unfoldr evalOne -- semantic equality evalEq :: (Eq a) => ListExp a -> ListExp a -> Bool evalEq = (==) `on` eval -- semantic equality, but check syntactic equality first. -- In every next recursion step, assume the arguments of the current recursion -- step to be equal. We can do that safely because two lists are equal iff -- they cannot be proven different. smartEq :: (Eq a) => ListExp a -> ListExp a -> Bool smartEq a b = smartEq' a b [] smartEq' :: (Eq a) => ListExp a -> ListExp a -> [(ListExp a, ListExp a)] -> Bool smartEq' a b eqPairs = if a == b || (a, b) `elem` eqPairs then True else case (evalOne a, evalOne b) of (Just _, Nothing) -> False (Nothing, Just _) -> False (Nothing, Nothing) -> True (Just (h1, t1), Just (h2, t2)) -> h1 == h2 && smartEq' t1 t2 ((a, b):eqPairs) Examples: *Main> smartEq (Cycle $ fromList [1]) (Cycle $ fromList [1,1]) True *Main> smartEq (Cons 1 $ Cycle $ fromList [1]) (Cycle $ fromList [1]) True *Main> smartEq (Cons 2 $ Cycle $ fromList [1]) (Cycle $ fromList [1]) False Any examples for hangups of 'smartEq' are greatly appreciated, I couldn't produce any so far. -- Steffen

On 07/21/2011 02:15 PM, Alexey Khudyakov wrote:
Any examples for hangups of 'smartEq' are greatly appreciated, I couldn't produce any so far.
Following sequences will hang smartEq. They are both infinite and aperiodic. smartEq (fromList primes) (fromList primes) smartEq (fromList pidigits) (fromList pidigits)
Err, yeah, of course. I would expect expressions of type ListExp to be finite as they represent written text. fromList therefore expects to receive only finite lists. Defining 'primes' using my method seems to be a bit of a challenge due to its recursive definition.

On Tue, 2011-07-19 at 23:33 -0700, Carl Howells wrote:
False positives and false negatives are both possible, depending on GC timing. Don't use it, unless you know why it can result in both false positives and false negatives, and you know why neither of those are bad for your use case.
Can you clarify what you mean by false positives? Do you just mean it may return true but then later behave as if there's no sharing? Or do you mean it may return true and then later the two expressions may be observably different? If the latter, then it seems this would be a pretty serious garbage collector bug, and that it would be impossible that such a bug wouldn't also break other code that doesn't use pointer equality at all. After all, we've got a running user thread, which if it were to force those thunks now they would necessarily be observably equal, but if it doesn't and waits until later they may be different? In any case, the name is still silly. unsafeCoerce and unsafePerformIO can both lead to RTS crashes... but we seem to be saying they aren't as unsafe as this one? Right. -- Chris Smith

On Wed, Jul 20, 2011 at 13:22, Chris Smith
On Tue, 2011-07-19 at 23:33 -0700, Carl Howells wrote:
False positives and false negatives are both possible, depending on GC timing. Don't use it, unless you know why it can result in both false positives and false negatives, and you know why neither of those are bad for your use case.
Can you clarify what you mean by false positives? Do you just mean it may return true but then later behave as if there's no sharing? Or do you mean it may return true and then later the two expressions may be observably different? If the latter, then it seems this would be a
I think it's more correct to say that the compiler is free to do things that would lead to false positives if it knows that it's safe to do so (and purity means it can find more of those cases, and more of them *will* be safe) — but there is no way for it to crowbar pointer equality tests in that case. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On Wed, 2011-07-20 at 13:32 -0400, Brandon Allbery wrote:
I think it's more correct to say that the compiler is free to do things that would lead to false positives if it knows that it's safe to do so (and purity means it can find more of those cases, and more of them *will* be safe) — but there is no way for it to crowbar pointer equality tests in that case.
I have looked up crowbar in a number of dictionaries of slang and informal usage... and still have no idea what you just said. Can you reword it? The point, I think, is that if pointer equality testing really does what it says, then there shouldn't *be* any correct implementation in which false positives are possible. It seems the claim is that the garbage collector might be moving things around, have just by chance happened to place the second value in the spot formerly occupied by the first, and have not updated the first pointer yet. But if that's the case, and it's executing arbitrary user code that may refer to that memory, then the garbage collector contains race conditions! Then this "false positives" issue is no different from any of the many other problems such a bug might trigger. -- Chris Smith

On Wed, Jul 20, 2011 at 10:40 AM, Chris Smith
I have looked up crowbar in a number of dictionaries of slang and informal usage... and still have no idea what you just said. Can you reword it?
Crowbars offer 'leverage'.
The point, I think, is that if pointer equality testing really does what it says, then there shouldn't *be* any correct implementation in which false positives are possible. It seems the claim is that the garbage collector might be moving things around, have just by chance happened to place the second value in the spot formerly occupied by the first, and have not updated the first pointer yet. But if that's the case, and it's executing arbitrary user code that may refer to that memory, then the garbage collector contains race conditions!
You assume that the GC uses the same primitive as the developer, and is inherently subject to its own race conditions. But Bertram has said that false positives are not possible. I can only assume that the pointer comparison is atomic with respect to the GC.

David Barbour wrote:
On Wed, Jul 20, 2011 at 10:40 AM, Chris Smith
wrote: The point, I think, is that if pointer equality testing really does what it says, then there shouldn't *be* any correct implementation in which false positives are possible. It seems the claim is that the garbage collector might be moving things around, have just by chance happened to place the second value in the spot formerly occupied by the first, and have not updated the first pointer yet. But if that's the case, and it's executing arbitrary user code that may refer to that memory, then the garbage collector contains race conditions!
You assume that the GC uses the same primitive as the developer, and is inherently subject to its own race conditions.
But Bertram has said that false positives are not possible. I can only assume that the pointer comparison is atomic with respect to the GC.
That's right. A lot of things that the CMM code (and eventually the machine code) generated by ghc does is atomic with respect to GCs - from a single worker thread's point of view, GCs only happen when it tries to allocate some memory. (Then it does a heap check, and if that fails, saves some state and hands control over to the garbage collector. If the state contains pointers, the GC will know that and adjust them. Finally the state is restored and execution resumes.) Between these points, the code is free to access pointers on the stack and heap and dereference them, without having to worry about GC changing the memory under its nose. The reallyUnsafePointerEquality# primitive is implemented at this low level, and there are no intervening heap checks, and thus no GCs that could interfere with the comparison. Best regards, Bertram

have not updated the first pointer yet. But if that's the case, and it's executing arbitrary user code that may refer to that memory, then the garbage collector contains race conditions!
Not necessarily, if the garbage collection and the move happened between taking the pointers of the two sides of f1 == f2, it would update all the references to f1, and the pointer value you just got would be wrong. Sure it would be unlucky, but there's nothing worse than a bug that happens once in a billion times. Niklas

On Wed, Jul 20, 2011 at 13:40, Chris Smith
On Wed, 2011-07-20 at 13:32 -0400, Brandon Allbery wrote:
of them *will* be safe) — but there is no way for it to crowbar pointer equality tests in that case.
I have looked up crowbar in a number of dictionaries of slang and informal usage... and still have no idea what you just said. Can you reword it?
Sorry, EE usage: a crowbar circuit forces a fuse to blow when something goes out of tolerance. More generally, it means forcing a failure.
The point, I think, is that if pointer equality testing really does what it says, then there shouldn't *be* any correct implementation in which
Maybe it will help if I put it this way: there's no guarantee that your pointer equality test is testing anything that has any actual relevance to how it's evaluating the expression. In extreme cases, there might not even be anything to test. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Is there any way of getting the following code to immediately return True without performing the element-by-element comparison? Essentially this boils down to checking whether pointers are equal before comparing the contents.
main = print $ f == f where f = [1..10^9]
Nikhil, As others pointed out, what you're asking for is not possible in Haskell, and for good reasons. However, this is an important problem, and it comes up quite often in practice when implementing DSLs: Detecting sharing. So, it's no surprise that people developed different ways of dealing with it in various forms. In my mind, Andy Gill came up with the nicest solution in his "Type-Safe Observable Sharing in Haskell" paper, published in the 2009 Haskell Symposium. Andy's paper traces the technique back to a 1999 paper by Simon Peyton Jones, Simon Marlow, and Conal Elliott. Probably few others came up with the same idea as well over the years. Andy's paper is a joy to read and I'd highly recommend it, you can get a copy at his web site: http://www.ittc.ku.edu/csdl/fpg/sites/default/files/Gill-09-TypeSafeReificat.... Andy's fundamental observation is that while you cannot check for "pointer-equality" in the pure world for obvious reasons, it's perfectly fine to do so when you're in the IO monad. He further observes that for almost all most practical use cases, this is really not an issue: You're probably in some sort of a monad wrapped over IO anyhow: This has certainly been my experience, for instance. While the paper has all the details, the "trick" is to use GHC's StableName abstraction. If you define: import System.Mem.StableName areEqual :: Eq a => a -> a -> IO Bool areEqual x y = do sx <- hashStableName `fmap` (x `seq` makeStableName x) sy <- hashStableName `fmap` (y `seq` makeStableName y) return $ (sx == sy) || x == y then areEqual will run quite fast if it indeed receives the same object twice, no matter how large it is. In fact, it might even be cyclic! (See Andy's paper for details.) However, if the stable-name equality fails, then you are *not* guaranteed that the objects are different, hence you further need to run the usual "==" on them, which can be quite costly. (Of course "==" can go in loops if the objects happen to have cycles in them.) You can also change the last line of areEqual to read: return $ sx == sy In this case, if it returns True then you're guaranteed that the objects are equal. If the result is False, then you just don't know. However it's guaranteed that the function will run fast in either case. Client code can decide on how to proceed based on that information. I hope this helps. Reading Andy's paper, and the papers he's cited can further elucidate the technique. In fact, Andy uses this idea to turn cyclic structures to graphs with explicit back-edges that can be processed much more easily in the pure world, something that comes up quite often in practice as well. -Levent.

On Jul 19, 2011, at 11:34 PM, Levent Erkok wrote:
import System.Mem.StableName
areEqual :: Eq a => a -> a -> IO Bool areEqual x y = do sx <- hashStableName `fmap` (x `seq` makeStableName x) sy <- hashStableName `fmap` (y `seq` makeStableName y) return $ (sx == sy) || x == y
One correction to the above code: Since we're actually comparing hashes, there's a non-zero chance that we might get a hash-collision; thus incorrectly identifying two different objects to be the same even though they have different stable names. To accommodate for that, you can use the hashes to index into a look-up table and then do a linear-scan to make sure it's an object that you've seen before. So the above code is *not* going to work for your purposes in general, but it can be extended to handle such equalities if you can afford to carry around the hash-table with you and be disciplined in how you perform your equality tests. Again, see Andy's paper (section 11) for further details on how he this problem can be handled in general. -Levent.

I would have thought that the compiler, as a matter of optimisation, could insert a check to see if (==) is comparing an object with itself. The only way I can see this breaking is with perverse instances of Eq that would return False for "f == f". Paul. On 07/20/2011 04:51 AM, Nikhil A. Patil wrote:
Hi,
Is there any way of getting the following code to immediately return True without performing the element-by-element comparison? Essentially this boils down to checking whether pointers are equal before comparing the contents.
main = print $ f == f where f = [1..10^9] Thanks!!
nikhil
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 21/07/2011, at 9:08 AM, Paul Johnson wrote:
I would have thought that the compiler, as a matter of optimisation, could insert a check to see if (==) is comparing an object with itself. The only way I can see this breaking is with perverse instances of Eq that would return False for "f == f".
== is a function with user-defined instances. It would be, to put it minimally, bad manners, but there's nothing to actually *stop* a programmer writing data Boojum = Plant | Snark instance Eq Boojum where Plant == Plant = True _ == _ = False f x = x == x main = print $ f Snark Presumably inside the body of f, x and x would be identical pointers, but the only right answer is False, not True. If you think this is a bit far fetched, consider the IEEE definition of equality for floating-point numbers: let x = 0.0/0.0 in x == x The answer is False, so the optimisation breaks down even with a system-defined type.

On Wed, Jul 20, 2011 at 23:53, Richard O'Keefe
On 21/07/2011, at 9:08 AM, Paul Johnson wrote:
I would have thought that the compiler, as a matter of optimisation, could insert a check to see if (==) is comparing an object with itself. The only way I can see this breaking is with perverse instances of Eq that would return False for "f == f".
Presumably inside the body of f, x and x would be identical pointers, but the only right answer is False, not True.
If you think this is a bit far fetched, consider the IEEE definition of equality for floating-point numbers:
let x = 0.0/0.0 in x == x
The answer is False, so the optimisation breaks down even with a system-defined type.
Also, NaNs are never equal to each other. Also consider SQL's NULL (relevant if you use Takusen, I suspect). -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

For the following expression, I would consider a True result a false positive: let x = x :: Int in x == x Dean
participants (19)
-
Alexey Khudyakov
-
Antoine Latter
-
Bertram Felgenhauer
-
Brandon Allbery
-
Carl Howells
-
Chris Smith
-
David Barbour
-
Dean Herington
-
Eugene Kirpichov
-
Levent Erkok
-
Nikhil A. Patil
-
Niklas Larsson
-
Paul Johnson
-
Pedro Vasconcelos
-
quick@sparq.org
-
Richard O'Keefe
-
Steffen Schuldenzucker
-
Thiago Negri
-
yi huang