Ord methods too strict?

newer
Maybe type filtered to remove...

older
Re: [Haskell-cafe] Ord methods too...

V.Liepelt

2 Jan 2019 2 Jan '19

9:47 a.m.

I am surprised to find that `False <= undefined = undefined`. What justifies (<=) to be strict in both arguments? Vilem

Show replies by date

Isaac Elliott

2 Jan 2 Jan

10:29 a.m.

One of the relationships implied by the Ord typeclass is: a <= b = True iff compare a b = EQ \/ compare a b = LT If we write an alternative definition of (<=) that is only strict in its first argument: False <= _ = True True <= x = x Then it's impossible to write `compare` in a way that's consistent with that relation. More concretely, it's likely that Ord Bool is defined via `compare`, which is necessarily strict in both arguments. On Wed, 2 Jan. 2019, 7:47 pm V.Liepelt, wrote:

...

I am surprised to find that `False <= undefined = undefined`.

What justifies (<=) to be strict in both arguments?

Vilem _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

V.Liepelt

11:31 a.m.

More concretely, it's likely that Ord Bool is defined via `compare`, which is necessarily strict in both arguments. Yes, this did come to mind. In general a non-strict `compare` would only make sense for `()`. One of the relationships implied by the Ord typeclass is: a <= b = True iff compare a b = EQ \/ compare a b = LT So wouldn’t it make sense to define `compare` in terms of the “weaker” relations? It seems very unhaskelly to do the unnecessary work of evaluating the second argument to a relation when we already know what the result should be. Vilem On 2 Jan 2019, at 10:29, Isaac Elliott mailto:isaace71295@gmail.com> wrote: One of the relationships implied by the Ord typeclass is: a <= b = True iff compare a b = EQ \/ compare a b = LT If we write an alternative definition of (<=) that is only strict in its first argument: False <= _ = True True <= x = x Then it's impossible to write `compare` in a way that's consistent with that relation. More concretely, it's likely that Ord Bool is defined via `compare`, which is necessarily strict in both arguments. On Wed, 2 Jan. 2019, 7:47 pm V.Liepelt, mailto:V.Liepelt@kent.ac.uk> wrote: I am surprised to find that `False <= undefined = undefined`. What justifies (<=) to be strict in both arguments? Vilem _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Tom Ellis

11:37 a.m.

On Wed, Jan 02, 2019 at 09:47:15AM +0000, V.Liepelt wrote:

...

I am surprised to find that `False <= undefined = undefined`.

What justifies (<=) to be strict in both arguments?

The implementation of the Ord instance for Bool is derived, as you can see here: https://www.stackage.org/haddock/lts-12.1/ghc-prim-0.5.2.0/src/GHC-Classes.h... As for the justification, perhaps it's too much of a special case for only one value of an enumeration to compare to undefined without crashing, and perhaps it inhibits optimisation opportunities. Tom

V.Liepelt

11:49 a.m.

...

The implementation of the Ord instance for Bool is derived

So my argument would be—doesn’t this mean that we need to do cleverer deriving or at least have a hand-written instance?

...

As for the justification, perhaps it's too much of a special case for only one value of an enumeration to compare to undefined without crashing

This is not just about crashing. (I’m using `undefined` as a way of making strictness explicit.) `False >= veryExpensiveComputation` should return `True` immediately without any unnecessary computation. Also it doesn’t seem like a special case: this makes sense for any partially ordered Type with a top and/or bottom element.

...

perhaps it inhibits optimisation opportunities.

That doesn’t seem very likely to me, I would rather think the contrary (see above): doing unnecessary work can hardly make a program run faster. V

...

On 2 Jan 2019, at 11:37, Tom Ellis wrote:

On Wed, Jan 02, 2019 at 09:47:15AM +0000, V.Liepelt wrote:

...
I am surprised to find that `False <= undefined = undefined`.

What justifies (<=) to be strict in both arguments?

The implementation of the Ord instance for Bool is derived, as you can see here:

https://www.stackage.org/haddock/lts-12.1/ghc-prim-0.5.2.0/src/GHC-Classes.h...

As for the justification, perhaps it's too much of a special case for only one value of an enumeration to compare to undefined without crashing, and perhaps it inhibits optimisation opportunities.

Tom _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Clinton Mead

12:09 p.m.

How would you define Ord on Bools? Like so? False <= _ = True _ <= y = y Note that even with this definition, "undefined <= True = undefined" so strictness is now not symmetric. Also, as bool are implemented as just an int of some sort (presumably 0 and 1), the strict definition allows the following implementation in effect: x <= y = fromEnum x <= fromEnum y Note "fromEnum" here is zero cost, and there is no branch. Checking the left hand branch for false first would require a branch that would possibly hit performance. This probably isn't worth slowing down this function just so it's lazy in it's right argument (but as a gotcha now, still strict in it's left argument). On Wed, Jan 2, 2019 at 10:50 PM V.Liepelt wrote:

...

...
The implementation of the Ord instance for Bool is derived

So my argument would be—doesn’t this mean that we need to do cleverer deriving or at least have a hand-written instance?

...
As for the justification, perhaps it's too much of a special case for only one value of an enumeration to compare to undefined without crashing

This is not just about crashing. (I’m using `undefined` as a way of making strictness explicit.) `False >= veryExpensiveComputation` should return `True` immediately without any unnecessary computation.

Also it doesn’t seem like a special case: this makes sense for any partially ordered Type with a top and/or bottom element.

...
perhaps it inhibits optimisation opportunities.

That doesn’t seem very likely to me, I would rather think the contrary (see above): doing unnecessary work can hardly make a program run faster.

V

...
On 2 Jan 2019, at 11:37, Tom Ellis < tom-lists-haskell-cafe-2017@jaguarpaw.co.uk> wrote:

On Wed, Jan 02, 2019 at 09:47:15AM +0000, V.Liepelt wrote:

...
I am surprised to find that `False <= undefined = undefined`.

What justifies (<=) to be strict in both arguments?

The implementation of the Ord instance for Bool is derived, as you can see here:

https://www.stackage.org/haddock/lts-12.1/ghc-prim-0.5.2.0/src/GHC-Classes.h...

...
As for the justification, perhaps it's too much of a special case for

only

...
one value of an enumeration to compare to undefined without crashing, and perhaps it inhibits optimisation opportunities.

Tom _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

V.Liepelt

12:34 p.m.

False <= _ = True _ <= y = y Yeah, that’s what I was thinking (in fact this is how I hit this in the first place—I was using `(<=)` as a quick and dirty logical implication operator and was surprised to hit `undefined` even though the precondition was False!) Note that even with this definition, "undefined <= True = undefined" so strictness is now not symmetric. Isn’t “asymmetry” the whole point of laziness? Consider `(||)`. Note "fromEnum" here is zero cost, and there is no branch. Checking the left hand branch for false first would require a branch that would possibly hit performance. This probably isn't worth slowing down this function just so it's lazy in it's right argument (but as a gotcha now, still strict in it's left argument). Yes, comparison on constants may happen to be faster on a concrete architecture. Thanks, I hadn’t considered this. However this assumes that both arguments are already evaluated, which of course will require branching unless you happen to know the second argument at compile time, for which case we might want to have a rewrite rule for the more efficient implementation. But it feels wrong to make architectural considerations leak into the standard library when this isn’t really necessary. V On 2 Jan 2019, at 12:09, Clinton Mead mailto:clintonmead@gmail.com> wrote: How would you define Ord on Bools? Like so? False <= _ = True _ <= y = y Note that even with this definition, "undefined <= True = undefined" so strictness is now not symmetric. Also, as bool are implemented as just an int of some sort (presumably 0 and 1), the strict definition allows the following implementation in effect: x <= y = fromEnum x <= fromEnum y Note "fromEnum" here is zero cost, and there is no branch. Checking the left hand branch for false first would require a branch that would possibly hit performance. This probably isn't worth slowing down this function just so it's lazy in it's right argument (but as a gotcha now, still strict in it's left argument). On Wed, Jan 2, 2019 at 10:50 PM V.Liepelt mailto:V.Liepelt@kent.ac.uk> wrote:

...

The implementation of the Ord instance for Bool is derived

So my argument would be—doesn’t this mean that we need to do cleverer deriving or at least have a hand-written instance?

...

As for the justification, perhaps it's too much of a special case for only one value of an enumeration to compare to undefined without crashing

...

perhaps it inhibits optimisation opportunities.

That doesn’t seem very likely to me, I would rather think the contrary (see above): doing unnecessary work can hardly make a program run faster. V

...

On 2 Jan 2019, at 11:37, Tom Ellis mailto:tom-lists-haskell-cafe-2017@jaguarpaw.co.uk> wrote:

On Wed, Jan 02, 2019 at 09:47:15AM +0000, V.Liepelt wrote:

...
I am surprised to find that `False <= undefined = undefined`.

What justifies (<=) to be strict in both arguments?

The implementation of the Ord instance for Bool is derived, as you can see here:

https://www.stackage.org/haddock/lts-12.1/ghc-prim-0.5.2.0/src/GHC-Classes.h...

As for the justification, perhaps it's too much of a special case for only one value of an enumeration to compare to undefined without crashing, and perhaps it inhibits optimisation opportunities.

Tom _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Li-yao Xia

3:06 p.m.

Hi Vilem,

...

This is not just about crashing. (I’m using `undefined` as a way of making strictness explicit.) `False >= veryExpensiveComputation` should return `True` immediately without any unnecessary computation.

...

Also it doesn’t seem like a special case: this makes sense for any

There is also an opposite situation, where evaluating the second argument allows us to free a lot of memory, as that argument would otherwise remain a thunk with references to many things. partially ordered Type with a top and/or bottom element. That may be acceptable with Bool because its values are small. But to compare two pairs `(a,b) <= (x,y)` we would have to first traverse `(a,b)` to decide whether it's the smallest element, before evaluating the second argument to WHNF and comparing component-wise. Besides the extra complexity (both in code and in run time), this would not be less strict than the current default, since the first argument could get completely evaluated before even looking at the second one: `(False, _|_) <= (True, True) = _|_` (currently, this is `True`). We also would not know whether the type (a, b) has a smallest element without stronger constraints than (Ord a, Ord b), so this optimization is really only practically feasible for types with a nullary first constructor. Li-yao

V.Liepelt

4 Jan 4 Jan

3:45 p.m.

Hi Li-yao, Thanks for you interesting thoughts. There is also an opposite situation, where evaluating the second argument allows us to free a lot of memory, as that argument would otherwise remain a thunk with references to many things. Good point, but this is one of the tradeoffs with laziness. In the case of GHC, will the runtime not eventually free the memory of the second argument and everything it points to? It surely must do this one way or another since otherwise laziness would mean that while a process is running it will keep leaking space. But to compare two pairs `(a,b) <= (x,y)` we would have to first traverse `(a,b)` to decide whether it's the smallest element, before evaluating the second argument to WHNF and comparing component-wise. Besides the extra complexity (both in code and in run time), this would not be less strict than the current default, since the first argument could get completely evaluated before even looking at the second one: `(False, _|_) <= (True, True) = _|_` (currently, this is `True`). So with the laziest possible implementation (below—do check for accuracy, I might have missed something) these are the cases where the result is defined for Bool and undefined for Lazy Bool: (F,u) <= (T,F) (F,u) <= (T,T) (F,u) <= (T,u) vs. (undefined for Bool and defined for Lazy Bool) (F,F) <= (F,u) (F,F) <= (u,F) (F,F) <= (u,T) (F,F) <= (u,u) (T,F) <= (T,u) So it does seem that we can make it marginally more lazy overall. I’m not saying we should, I was just genuinely surprised that Ord is strict. Definitions: F <= _ = T T <= q = q F < q = q T < _ = F (p, q) == (r, s) = p == r /\ q == s (F, F) <= _ = T (p, q) <= (r, s) = p < r \/ p == r /\ q <= s (T, T) < _ = F We also would not know whether the type (a, b) has a smallest element without stronger constraints than (Ord a, Ord b), so this optimization is really only practically feasible for types with a nullary first constructor. Hm, perhaps we are thinking about this in different ways. From the definition above, we don’t need any stronger constraints than Ord. Best, Vilem On 2 Jan 2019, at 15:06, Li-yao Xia mailto:lysxia@gmail.com> wrote: Hi Vilem,

...

This is not just about crashing. (I’m using `undefined` as a way of making strictness explicit.) `False >= veryExpensiveComputation` should return `True` immediately without any unnecessary computation.

There is also an opposite situation, where evaluating the second argument allows us to free a lot of memory, as that argument would otherwise remain a thunk with references to many things.

...

Also it doesn’t seem like a special case: this makes sense for any partially ordered Type with a top and/or bottom element.

That may be acceptable with Bool because its values are small. But to compare two pairs `(a,b) <= (x,y)` we would have to first traverse `(a,b)` to decide whether it's the smallest element, before evaluating the second argument to WHNF and comparing component-wise. Besides the extra complexity (both in code and in run time), this would not be less strict than the current default, since the first argument could get completely evaluated before even looking at the second one: `(False, _|_) <= (True, True) = _|_` (currently, this is `True`). We also would not know whether the type (a, b) has a smallest element without stronger constraints than (Ord a, Ord b), so this optimization is really only practically feasible for types with a nullary first constructor. Li-yao

Sven Panne

4:47 p.m.

Am Fr., 4. Jan. 2019 um 16:46 Uhr schrieb V.Liepelt :

...

[...] Good point, but this is one of the tradeoffs with laziness. In the case of GHC, will the runtime not eventually free the memory of the second argument and everything it points to? It surely must do this one way or another since otherwise laziness would mean that while a process is running it will keep leaking space.

You *will* leak space if you keep an unevaluated argument around for a long time, there is nothing any implementation can really do about it: It can't know for sure if you will eventually throw away that argument or (partially) evaluate it. Sometimes the optimizer (e.g. via strictness analysis) can help, but not in the general case. So making an argument lazier is not a no-brainer, quite the opposite...

Viktor Dukhovni

5 p.m.

...

On Jan 4, 2019, at 11:47 AM, Sven Panne wrote:

You *will* leak space if you keep an unevaluated argument around for a long time, there is nothing any implementation can really do about it: It can't know for sure if you will eventually throw away that argument or (partially) evaluate it. Sometimes the optimizer (e.g. via strictness analysis) can help, but not in the general case. So making an argument lazier is not a no-brainer, quite the opposite...

Only if it is repeatedly used in ever deeper unevaluated expressions, such as repeatedly incrementing an IORef counter, without ever forcing the value. In the case of a lazy function ignoring its argument, no space is leaked unless that argument is retained elsewhere, and the function was the sole means of forcing the value. Unreferenced unevaluated thunks get GC'd. Space leaks require ever deeper chains of unevaluated thunks. If laziness always leaked Haskell would not work terribly well. I have code that runs for ~9 hours in constant memory allocating and freeing around 13TB of memory over its lifetime. Various functions it calls do some lazy evaluation or other, and while in some cases forcing the values that are sure to get used might reduce GC activity, there's no space leak. -- Viktor.

Sven Panne

6:31 p.m.

Am Fr., 4. Jan. 2019 um 18:24 Uhr schrieb Viktor Dukhovni < ietf-dane@dukhovni.org>:

...

[...] In the case of a lazy function ignoring its argument, no space is leaked unless that argument is retained elsewhere, and the function was the sole means of forcing the value.

This is the scenario I had in mind, but obviously I didn't explain that terribly well... :-} Personally, I think this is not so uncommon, at least in the initial stages of writing a library or program.

...

[...] Space leaks require ever deeper chains of unevaluated thunks. [...]

Perhaps we have different definitions of "space leak" (is there an "official" one?), but keeping a single thunk alive for a long time, which in turn keeps some huge data structure alive is a kind of leak, too, no deep chains involved. Maybe there is a better term describing this situation.

Tom Ellis

2 Jan 2 Jan

2:42 p.m.

On Wed, Jan 02, 2019 at 11:49:40AM +0000, V.Liepelt wrote:

...

...
As for the justification, perhaps it's too much of a special case for only one value of an enumeration to compare to undefined without crashing

Also it doesn’t seem like a special case: this makes sense for any partially ordered Type with a top and/or bottom element.

I mean that for any data type data E of C1 | C2 | ... you are preposing that the (<=) of the derived Ord instance would have a special case for C1.

Sven Panne

7:06 p.m.

Am Mi., 2. Jan. 2019 um 15:42 Uhr schrieb Tom Ellis < tom-lists-haskell-cafe-2017@jaguarpaw.co.uk>:

...

I mean that for any data type

data E of C1 | C2 | ...

you are preposing that the (<=) of the derived Ord instance would have a special case for C1.

... and another special case for (>=) for the last enumeration value. And a very special version for one-element enumerations. And what about (<) and (>)? And derived (==) for one-element enumerations? And, and, and... :-P Putting on the language lawyer hat: The Haskell Report explicitly states that Bool's Ord instance is derived (section 6.1.1), and section 11.1 explicitly states that all derived operations for Eq and Ord are strict in both arguments. Consequently, Implementing Ord for Bool in a lazier way would violate the specification. I would say that the the way the report specifies this is a good thing: Coming up with special cases is a bad design principle, consistency almost always trumps anything else (the human brain is notoriously small). Furthermore, laziness is not always a good thing, it could lead to space leaks, so e.g. making Bool's Ord operations lazier would definitely make various people very unhappy sooner or later. :-)

V.Liepelt

4 Jan 4 Jan

3:55 p.m.

Hi Sven, […] section 11.1 [of the Haskell 2010 Report] explicitly states that all derived operations for Eq and Ord are strict in both arguments. Good idea to look at the Report. Unfortunately it doesn’t give any reasoning as to why this choice was made. Best, Vilem On 2 Jan 2019, at 19:06, Sven Panne mailto:svenpanne@gmail.com> wrote: Am Mi., 2. Jan. 2019 um 15:42 Uhr schrieb Tom Ellis mailto:tom-lists-haskell-cafe-2017@jaguarpaw.co.uk>: I mean that for any data type data E of C1 | C2 | ... you are preposing that the (<=) of the derived Ord instance would have a special case for C1. ... and another special case for (>=) for the last enumeration value. And a very special version for one-element enumerations. And what about (<) and (>)? And derived (==) for one-element enumerations? And, and, and... :-P Putting on the language lawyer hat: The Haskell Report explicitly states that Bool's Ord instance is derived (section 6.1.1), and section 11.1 explicitly states that all derived operations for Eq and Ord are strict in both arguments. Consequently, Implementing Ord for Bool in a lazier way would violate the specification. I would say that the the way the report specifies this is a good thing: Coming up with special cases is a bad design principle, consistency almost always trumps anything else (the human brain is notoriously small). Furthermore, laziness is not always a good thing, it could lead to space leaks, so e.g. making Bool's Ord operations lazier would definitely make various people very unhappy sooner or later. :-) _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Sven Panne

4:37 p.m.

Am Fr., 4. Jan. 2019 um 16:55 Uhr schrieb V.Liepelt :

...

Hi Sven,

[…] section 11.1 [of the Haskell 2010 Report] explicitly states that all derived operations for Eq and Ord are strict in both arguments.

Good idea to look at the Report. Unfortunately it doesn’t give any reasoning as to *why* this choice was made.

As I said: Consistency. (Well, I didn't write the report, but this seems to be the most obvious reason.)

2381

Age (days ago)

2383

Last active (days ago)

List overview

Download

15 comments

7 participants

participants (7)

Clinton Mead
Isaac Elliott
Li-yao Xia
Sven Panne
Tom Ellis
V.Liepelt
Viktor Dukhovni