[trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

dons＠cse.unsw.edu.au

19 Mar 2007 19 Mar '07

12:02 a.m.

I propose we *do not* change the api to add the special case: sortNub = sort . nub = map head . group . sort and instead add a rewrite rule to Data.List to provide this optimisation. {-# RULES "sort/nub" sort . nub = map head . group . sort #-} Along with a QuickCheck property to test it: prop_sortnub xs = sort (nub xs) == map head (group (sort xs)) -- Don ----- Forwarded message from GHC ----- Date: Sun, 18 Mar 2007 23:51:50 -0000 From: GHC To: glasgow-haskell-bugs@haskell.org Subject: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List #1218: Add sortNub and sortNubBy to Data.List ----------------------------+----------------------------------------------- Reporter: neil | Owner: Type: proposal | Status: new Priority: normal | Milestone: Not GHC Component: libraries/base | Version: Severity: normal | Resolution: Keywords: | Difficulty: Unknown Testcase: | Architecture: Unknown Os: Unknown | ----------------------------+----------------------------------------------- Comment (by dons): How about rather than changing the api, purely for efficiency reasons, we just add a rewrite rule to Data.List for this optimisation? The rule would be: {{{ {-# RULES "sort/nub" sort . nub = map head . group . sort #-} }}} sort . nub will be rewritten to the optimised case, and we won't need to further complicate the API. Quite often now it is possible for the library author to expose only a small, generic API, while providing internal rules for specific optimised cases. This keeps the interface small, while still providing performance. Data.ByteString does this in a number of places, for example: {{{ {-# RULES "FPS specialise filter (== x)" forall x. filter (== x) = filterByte x #-} }}} I'd really not like to see more functions exposed in the list interface, only as optimisations, that could be better provided via RULES. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/1218 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler _______________________________________________ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs ----- End forwarded message -----

Show replies by date

Duncan Coutts

19 Mar 19 Mar

12:22 a.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Mon, 2007-03-19 at 11:02 +1100, Donald Bruce Stewart wrote:

...

I propose we *do not* change the api to add the special case:

sortNub = sort . nub = map head . group . sort

and instead add a rewrite rule to Data.List to provide this optimisation.

{-# RULES "sort/nub" sort . nub = map head . group . sort #-}

Though of course we'd want to use the more efficient implementation than that. One can write a sort that directly eliminates duplicates, and I think you'd want a rule for nub . sort as well as sort . nub. And while we're at it we can add a rule to rewrite all the places people have used map head . group . sort (or the goupBy f . sortBy f equivalents) to this equals-eliminating sort implementation. Duncan

John Meacham

21 Mar 21 Mar

9:06 p.m.

On Mon, Mar 19, 2007 at 11:22:47AM +1100, Duncan Coutts wrote:

...

On Mon, 2007-03-19 at 11:02 +1100, Donald Bruce Stewart wrote:

...
I propose we *do not* change the api to add the special case:

sortNub = sort . nub = map head . group . sort

and instead add a rewrite rule to Data.List to provide this optimisation.

{-# RULES "sort/nub" sort . nub = map head . group . sort #-}

Though of course we'd want to use the more efficient implementation than that. One can write a sort that directly eliminates duplicates, and I think you'd want a rule for nub . sort as well as sort . nub.

And while we're at it we can add a rule to rewrite all the places people have used map head . group . sort (or the goupBy f . sortBy f equivalents) to this equals-eliminating sort implementation.

note, this is most likely not what you want at all:

...

...
{-# RULES "sort/nub" sort . nub = map head . group . sort #-}

desugars to

...

...
{-# RULES "sort/nub" (.) sort nub = map head . group . sort #-}

meaning you are actually adding a RULE to (.), not sort or nub. this hasa couple of unfortunate consequences. * people will have to use 'sort . nub' rather than something like sort $ nub xs * (.) will no longer be inlined until much later in the optimization passes, which perhaps could have disasterous consequences that said, I do not believe this is an appropriate use of rules to begin with, it changes the asymptotic complexity, so should be available as an explicit option, and is just a generally useful routine. not that rules can't also exist, but attached to the right thing. (sort (nub xs)) = nubSort xs (nub (sort xs)) = nubSort xs now, I would also like to see (and it might turn out to be more useful), fastNub, which is a nub that uses Ord and sets to be just as fast, but works on infinite lists. accidentally evaluating a whole list in memory is the quickest way to a space leak and 'sort' is the most common culprit. this is why 'nub' can often be _faster_ than sortNub when you only partially evaluate a list, with ordNub, you get the benefits of both! John -- John Meacham - ⑆repetae.net⑆john⑈

Duncan Coutts

11:03 p.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Wed, 2007-03-21 at 14:06 -0700, John Meacham wrote:

...

On Mon, Mar 19, 2007 at 11:22:47AM +1100, Duncan Coutts wrote:

...

note, this is most likely not what you want at all:

...
...
{-# RULES "sort/nub" sort . nub = map head . group . sort #-}

desugars to

...
...
{-# RULES "sort/nub" (.) sort nub = map head . group . sort #-}

meaning you are actually adding a RULE to (.), not sort or nub. this hasa couple of unfortunate consequences.

Yes, I know. I just wrote it that way for clarity.

...

that said, I do not believe this is an appropriate use of rules to begin with, it changes the asymptotic complexity, so should be available as an explicit option, and is just a generally useful routine. not that rules can't also exist, but attached to the right thing.

(sort (nub xs)) = nubSort xs (nub (sort xs)) = nubSort xs

Though as several people pointed out these rules are sadly wrong in the presence of dodgy Eq/Ord instances. :-(

...

now, I would also like to see (and it might turn out to be more useful),

fastNub, which is a nub that uses Ord and sets to be just as fast, but works on infinite lists. accidentally evaluating a whole list in memory is the quickest way to a space leak and 'sort' is the most common culprit. this is why 'nub' can often be _faster_ than sortNub when you only partially evaluate a list, with ordNub, you get the benefits of both!

Yes. I guess I'm convinced. :-) Duncan

Jan-Willem Maessen

19 Mar 19 Mar

1:58 a.m.

On Mar 18, 2007, at 8:02 PM, Donald Bruce Stewart wrote:

...

I propose we *do not* change the api to add the special case:

sortNub = sort . nub = map head . group . sort

and instead add a rewrite rule to Data.List to provide this optimisation.

{-# RULES "sort/nub" sort . nub = map head . group . sort #-}

So that the asymptotic complexity of my programs depends upon the firing of RULES? No thank you. I admit I reflexively type "map head . group . sort" whenever this comes up, but it gets excessively ugly---especially since groupBy and sortBy take differently-typed arguments. What I want is a simple duplicate-discarding sort, so it's obvious what's going on. I don't much care whether its internals are "map head . group . sort" or "toList . toSet" or something else; I'd be happy if an informed decision were made on my behalf once. -Jan-Willem Maessen

Neil Mitchell

2:02 a.m.

Hi Donald,

...

I propose we *do not* change the api to add the special case:

sortNub = sort . nub = map head . group . sort

and instead add a rewrite rule to Data.List to provide this optimisation.

{-# RULES "sort/nub" sort . nub = map head . group . sort #-}

Yes, but this is only available for GHC, not for Haskell... A lot of the changes that have been made recently (i.e. the ByteString stuff) have been performance optimisations for GHC, but not for other Haskell compilers. Talking to Malcolm recently he said that String outperforms ByteString with nhc. I think its important when you are using an algorithm to say "I want to use an O(n log n) algorithm instead of an O(n^2) one" and have the compiler obey you unconditionally. If the compiler wants to replace the O(n^2) one as well, that is just groovy, but your big-O shouldn't depend on compiler specific rules. Thanks Neil

Bertram Felgenhauer

10:42 a.m.

Neil Mitchell wrote: [snip]

...

I think its important when you are using an algorithm to say "I want to use an O(n log n) algorithm instead of an O(n^2) one" and have the compiler obey you unconditionally. If the compiler wants to replace the O(n^2) one as well, that is just groovy, but your big-O shouldn't depend on compiler specific rules.

I'd like to second Neil's opinion. In fact I think that switching compilers shouldn't change the asymptotic complexity of programs. This is already difficult enough in Haskell because it depends on what common subexpressions a compiler decides to share. In addition to this, I view Data.List as a rather low level library; to me, writing 'nub xs' specifies an algorithm rather than a result. Lists are a fundamental data structure for Haskell programs and fine grained control about how they're processed is often essential. In particular there may be cases where you actually want the naive algorithm for nub, as we've seen in the discussion so far. Another point is that Haskell is also meant to be a teaching language, and list processing is likely a starting point for new coders. As such, adding more magic to Data.List seems wrong to me, because it makes understanding the library harder. Personally, I'd like to see a sortNub function and a nubOrd function (which doesn't change the order). Maybe rules for changing algorithms could go to a different module, say Data.List.Magic - for people who want them. Thanks for reading, Bertram

Lennart Augustsson

7:43 p.m.

As I pointed out in the GHC trac report, this rule is wrong. E.g., data T = T Int Int deriving (Ord, Show) instance Eq T where T x _ == T y _ = x == y ts = [T 1 1, T 1 undefined] On Mar 19, 2007, at 00:02 , Donald Bruce Stewart wrote:

...

I propose we *do not* change the api to add the special case:

sortNub = sort . nub = map head . group . sort

and instead add a rewrite rule to Data.List to provide this optimisation.

{-# RULES "sort/nub" sort . nub = map head . group . sort #-}

Along with a QuickCheck property to test it:

prop_sortnub xs = sort (nub xs) == map head (group (sort xs))

-- Don

----- Forwarded message from GHC -----

Date: Sun, 18 Mar 2007 23:51:50 -0000 From: GHC To: glasgow-haskell-bugs@haskell.org Subject: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List

#1218: Add sortNub and sortNubBy to Data.List ---------------------------- +----------------------------------------------- Reporter: neil | Owner: Type: proposal | Status: new Priority: normal | Milestone: Not GHC Component: libraries/base | Version: Severity: normal | Resolution: Keywords: | Difficulty: Unknown Testcase: | Architecture: Unknown Os: Unknown | ---------------------------- +----------------------------------------------- Comment (by dons):

How about rather than changing the api, purely for efficiency reasons, we just add a rewrite rule to Data.List for this optimisation?

The rule would be:

{{{ {-# RULES "sort/nub" sort . nub = map head . group . sort #-} }}}

sort . nub will be rewritten to the optimised case, and we won't need to further complicate the API.

Quite often now it is possible for the library author to expose only a small, generic API, while providing internal rules for specific optimised cases. This keeps the interface small, while still providing performance.

Data.ByteString does this in a number of places, for example:

{{{ {-# RULES "FPS specialise filter (== x)" forall x. filter (== x) = filterByte x #-} }}}

I'd really not like to see more functions exposed in the list interface, only as optimisations, that could be better provided via RULES.

-- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/1218 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler _______________________________________________ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

----- End forwarded message ----- _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Duncan Coutts

11:05 p.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Mon, 2007-03-19 at 19:43 +0000, Lennart Augustsson wrote:

...

As I pointed out in the GHC trac report, this rule is wrong. E.g.,

data T = T Int Int deriving (Ord, Show) instance Eq T where T x _ == T y _ = x == y

ts = [T 1 1, T 1 undefined]

So the problem in that example is that if we nub first then we get [T 1 1] and sorting that is presumably fine, however if we sort we compare both fields. My grumble here is that your Eq and Ord instances are not in agreement with each other. If you can claim that they're equal by only comparing the first components then surely you must for Ord too. Isn't some law being violated here: T 1 0 < T 1 1 and yet T 1 0 == T 1 1 but shouldn't a < b imply that a != b H98 requires that Eq and Ord be equality classes and total orders, does it say nothing about any relationship between the two, that they be in any way consistent? So one could say that we can't really rely on any laws that should go along with type class instances, but that's clearly not right since then we cannot sort! If we cannot rely on Ord then the current Data.List.sort implementation is wrong because it gives different answers from the H98 spec sort when the Ord instance is not a total order (this is easiest to see with sortBy). Duncan

Nils Anders Danielsson

20 Mar 20 Mar

1:11 p.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Tue, 20 Mar 2007, Duncan Coutts wrote:

...

H98 requires that Eq and Ord be equality classes and total orders

Does it? I was under the impression that a compiler can never assume that any laws hold of any user-defined instances (to do optimisations, for instance). The report states the following: "The Eq class provides equality (==) and inequality (/=) methods." "The Ord class is used for totally ordered datatypes." I interpret this as being mere usage guidelines, and nothing more. What would it mean to state that Ord is a total order, by the way? Should (<=) always return True or False, never ⊥? In that case there is only one implementation of (<=): const (const True).

...

So one could say that we can't really rely on any laws that should go along with type class instances, but that's clearly not right since then we cannot sort!

Libraries can of course depend on various laws (hopefully this dependence is explicit in the documentation). I guess you can argue whether rewrite rules are part of the compiler or the library. To me a rewrite rule is fishy unless the two sides are observationally equivalent, though.

...

If we cannot rely on Ord then the current Data.List.sort implementation is wrong because it gives different answers from the H98 spec sort when the Ord instance is not a total order (this is easiest to see with sortBy).

Then I would say it is wrong (according to H98), yes. -- /NAD

Neil Mitchell

1:28 p.m.

...

Does it? I was under the impression that a compiler can never assume that any laws hold of any user-defined instances (to do optimisations, for instance).

That was my impression.

...

...
So one could say that we can't really rely on any laws that should go along with type class instances, but that's clearly not right since then we cannot sort!

Libraries can of course depend on various laws (hopefully this dependence is explicit in the documentation). I guess you can argue whether rewrite rules are part of the compiler or the library. To me a rewrite rule is fishy unless the two sides are observationally equivalent, though.

Taking an example, if your Ord instance is not transitive then Data.Map may loop infinitely. Tracking this down was not particularly fun... I'd guess that if there is an Ord and an Eq instance, they should be in agreement, but until the compiler can prove it, I'd be wary of exploiting it. Thanks Neil

Duncan Coutts

10:53 p.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Tue, 2007-03-20 at 14:11 +0100, Nils Anders Danielsson wrote:

...

On Tue, 20 Mar 2007, Duncan Coutts wrote:

...
H98 requires that Eq and Ord be equality classes and total orders

Does it? I was under the impression that a compiler can never assume that any laws hold of any user-defined instances (to do optimisations, for instance).

The report states the following:

"The Eq class provides equality (==) and inequality (/=) methods." "The Ord class is used for totally ordered datatypes."

I interpret this as being mere usage guidelines, and nothing more.

Aye.

...

...
If we cannot rely on Ord then the current Data.List.sort implementation is wrong because it gives different answers from the H98 spec sort when the Ord instance is not a total order (this is easiest to see with sortBy).

Then I would say it is wrong (according to H98), yes.

Actually it's fine. The library report states that: When the "By" function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence; when the "By" function replaces an Ord context by a binary predicate, the predicate is assumed to define a total ordering. So we can assume they're ok for the specification of those List module functions but as you say, beyond that we can't make any hard assumptions. In other words we can assume Ord is a total order (for non-_|_ values) for the purposes of defining sort but not for rules like: sort . nub = sortDiscardingDuplicates So Lennart is allowed to keep his bogus Eq instances, we have to promise not to break his programs, although the implementation of the sort function doesn't have to sort his values correctly (since presumably it's allowed to use == as well as <= >= etc). So can anyone break this hypothesis? nub . sort = map head . group . sort ie sort first then discard duplicates? Duncan

Jan-Willem Maessen

21 Mar 21 Mar

2:16 a.m.

On Mar 20, 2007, at 6:53 PM, Duncan Coutts wrote:

...

On Tue, 2007-03-20 at 14:11 +0100, Nils Anders Danielsson wrote:

...
On Tue, 20 Mar 2007, Duncan Coutts wrote:

...
H98 requires that Eq and Ord be equality classes and total orders

Does it? I was under the impression that a compiler can never assume that any laws hold of any user-defined instances (to do optimisations, for instance).

The report states the following:

"The Eq class provides equality (==) and inequality (/=) methods." "The Ord class is used for totally ordered datatypes."

I interpret this as being mere usage guidelines, and nothing more.

Aye.

Hmm, I've always taken these to be binding contracts. For example, do I get to assume either == or <= are transitive? But I admit this is at the level of writing libraries---so that, as another poster noted, I'm allowed to use a sort function which yields a different result (possibly _|_) if I don't obey the contract. But I'd often like my compiler to assume that <= is transitive... -Jan-Willem Maessen

Bertram Felgenhauer

10:48 a.m.

Duncan Coutts wrote:

...

So can anyone break this hypothesis?

nub . sort = map head . group . sort

Just make Eq and Ord instances that are completely unrelated, say data Foo = Foo Int Int deriving Show instance Eq Foo where Foo a _ == Foo b _ = a == b instance Ord Foo where Foo _ a `compare` Foo _ b = a `compare` b example = [Foo 0 1, Foo 0 2, Foo 1 1] With these instances nub . sort does something well-defined; sort only uses `compare` and nub only uses (==). Note that Data.List.sort actually provides a stable sort. I don't know if that's specified anywhere, but it's true for both Data.List and for the Haskell report implementation of sort. The rule

...

nub . sort = map head . group . sort

on the other hand relies on a correspondence between Eq and Ord and breaks: > nub . sort $ example [Foo 0 1,Foo 1 1] > map head . group . sort $ example [Foo 0 1,Foo 1 1,Foo 0 2] More precisely the rule works iff a <= b && b <= c && a == c implies a == b. I'm not sure whether mismatching Eq and Ord [*] instances should be a programmer error, but if so this needs to be stated somewhere. Right now, Data.List has no such restriction. Bertram [*] Ideally, the correspondance should be that (a == b) == (a <= b && b <= a)

Duncan Coutts

11:29 a.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Wed, 2007-03-21 at 11:48 +0100, Bertram Felgenhauer wrote:

...

Duncan Coutts wrote:

...
So can anyone break this hypothesis?

nub . sort = map head . group . sort

Just make Eq and Ord instances that are completely unrelated, say

I had to compile the code and go thorough it to convince myself, but yes as you say I'd fallen into the trap of 'believing' <= and == etc. I was assuming that sort would put 'equal' elements next to each other in the result but of course with compare x y == EQ not being the same as x == y that's not the case. Sigh. Thanks for the example. I want type classes with real property conditions. I don't mind if they're checked or not, but I want to be able to break people's code if they don't obey the properties and optimise the code for the rest of us. class Ord a => OrdReallyTruely a where ... then I want to use ghc rules that match on the subset of Ord types that are instances of OrdReallyTruely and be able to do the appropriate transformations. forall (xs :: OrdReallyTruely a => [a]). nub (sort xs) = map head (group (sort xs)) Duncan

Nils Anders Danielsson

3:18 p.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Tue, 20 Mar 2007, Duncan Coutts wrote:

...

Actually it's fine. The library report states that:

When the "By" function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence; when the "By" function replaces an Ord context by a binary predicate, the predicate is assumed to define a total ordering.

So we can assume they're ok for the specification of those List module functions but as you say, beyond that we can't make any hard assumptions.

Well, the report says (about the Prelude, but I think the same applies to the libraries): "It constitutes a specification for the Prelude. Many of the definitions are written with clarity rather than efficiency in mind, and it is not required that the specification be implemented as shown here." I interpret this as "you can change (optimise) the definitions, but the user shouldn't be able to observe the difference (except by measuring resource usage)". What else could "specification" mean?

...

In other words we can assume Ord is a total order (for non-_|_ values) for the purposes of defining sort [...]

I'd say you still need to define a function observationally equivalent to the one given in the report. -- /NAD

Duncan Coutts

10:17 p.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Wed, 2007-03-21 at 16:18 +0100, Nils Anders Danielsson wrote:

...

On Tue, 20 Mar 2007, Duncan Coutts wrote:

...

Well, the report says (about the Prelude, but I think the same applies to the libraries):

"It constitutes a specification for the Prelude. Many of the definitions are written with clarity rather than efficiency in mind, and it is not required that the specification be implemented as shown here."

I interpret this as "you can change (optimise) the definitions, but the user shouldn't be able to observe the difference (except by measuring resource usage)". What else could "specification" mean?

In particular does that mean that an implementation must have *exactly* the same strictness as given in the report? It never says explicitly. Probably it should mean that, but in that case there are a handful of cases where the report needs to be fixed. There are a couple cases where the report is stricter than is necessary and conversely a couple cases where it is lazier than was probably really indented. I will set out the detail on this to the libraries and haskell' list soonish.

...

I'd say you still need to define a function observationally equivalent to the one given in the report.

Then we would really have to use insertion sort, not quicksort, or mergesort or heapsort. I don't think that's what the spec intends. It does after all say for the various *By functions that we can assume the passed function is an equivalence or a total order: When the "By" function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence; when the "By" function replaces an Ord context by a binary predicate, the predicate is assumed to define a total ordering. It also defines sort = sortBy compare and nub = nubBy (==) so we can also assume that Eq is an equivalence and Ord instance is a total order for the purposes of defining sort, nub etc. But only for those library functions, not in general. Duncan

Lennart Augustsson

11:41 p.m.

Of course it means that a function must have exactly the same strictness. If it doesn't have the same strictness it's not the same function. I agree the report should be fixed where the functions in the report do not have the appropriate strictness. But this is a change to the report. -- Lennart On Mar 21, 2007, at 22:17 , Duncan Coutts wrote:

...

On Wed, 2007-03-21 at 16:18 +0100, Nils Anders Danielsson wrote:

...
On Tue, 20 Mar 2007, Duncan Coutts wrote:

...
Well, the report says (about the Prelude, but I think the same applies to the libraries):

"It constitutes a specification for the Prelude. Many of the definitions are written with clarity rather than efficiency in mind, and it is not required that the specification be implemented as shown here."

I interpret this as "you can change (optimise) the definitions, but the user shouldn't be able to observe the difference (except by measuring resource usage)". What else could "specification" mean?

In particular does that mean that an implementation must have *exactly* the same strictness as given in the report? It never says explicitly. Probably it should mean that, but in that case there are a handful of cases where the report needs to be fixed. There are a couple cases where the report is stricter than is necessary and conversely a couple cases where it is lazier than was probably really indented.

I will set out the detail on this to the libraries and haskell' list soonish.

...
I'd say you still need to define a function observationally equivalent to the one given in the report.

Then we would really have to use insertion sort, not quicksort, or mergesort or heapsort. I don't think that's what the spec intends.

It does after all say for the various *By functions that we can assume the passed function is an equivalence or a total order:

When the "By" function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence; when the "By" function replaces an Ord context by a binary predicate, the predicate is assumed to define a total ordering.

It also defines sort = sortBy compare and nub = nubBy (==) so we can also assume that Eq is an equivalence and Ord instance is a total order for the purposes of defining sort, nub etc. But only for those library functions, not in general.

Duncan

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Duncan Coutts

22 Mar 22 Mar

12:58 a.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Wed, 2007-03-21 at 23:41 +0000, Lennart Augustsson wrote:

...

Of course it means that a function must have exactly the same strictness. If it doesn't have the same strictness it's not the same function.

Of course.

...

I agree the report should be fixed where the functions in the report do not have the appropriate strictness. But this is a change to the report.

It's interesting to note that the implementations in base for some functions in Data.List have been inconsistent with the report for years and nobody has noticed (or at least if they noticed they didn't complain very loudly). For example the standard implementations of partition and splitAt have been inconsistent with the H98 spec forever. I say forever since the implementations were consistent with the Haskell 1.2, 1.3 and 1.4 specifications but the Haskell98 report changed the specification and the implementations were never updated. Similarly, GHC has had an incorrect implementation of genericTake for some years and no-one has noticed. The current implementation is inconsistent with Haskell98 and looking back, GHC 3.02 was inconsistent with Haskell 1.4 which as far as I can tell was the first version of the spec to define genericTake. Duncan

Lennart Augustsson

7:57 a.m.

Well, I think people use the functions that come with an implementation as the de facto spec. So we should update the spec to reflect reality. (BTW, I wouldn't be surprised if I'm to blame for some of the discrepancies, even though I tried to be careful when I implemented more efficient versions for the Prelude functions for hbc.) -- Lennart On Mar 22, 2007, at 00:58 , Duncan Coutts wrote:

...

On Wed, 2007-03-21 at 23:41 +0000, Lennart Augustsson wrote:

...
Of course it means that a function must have exactly the same strictness. If it doesn't have the same strictness it's not the same function.

Of course.

...
I agree the report should be fixed where the functions in the report do not have the appropriate strictness. But this is a change to the report.

It's interesting to note that the implementations in base for some functions in Data.List have been inconsistent with the report for years and nobody has noticed (or at least if they noticed they didn't complain very loudly).

For example the standard implementations of partition and splitAt have been inconsistent with the H98 spec forever. I say forever since the implementations were consistent with the Haskell 1.2, 1.3 and 1.4 specifications but the Haskell98 report changed the specification and the implementations were never updated.

Similarly, GHC has had an incorrect implementation of genericTake for some years and no-one has noticed. The current implementation is inconsistent with Haskell98 and looking back, GHC 3.02 was inconsistent with Haskell 1.4 which as far as I can tell was the first version of the spec to define genericTake.

Duncan

Nils Anders Danielsson

11:10 a.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Thu, 22 Mar 2007, Duncan Coutts wrote:

...

It's interesting to note that the implementations in base for some functions in Data.List have been inconsistent with the report for years and nobody has noticed (or at least if they noticed they didn't complain very loudly).

I have certainly found some discrepancies in the past. If you never reason about partial or infinite values you're not likely to stumble over these inconsistencies, though. If you're a GHC user worried about partial/infinite values you should be aware that the GHC team now and then chooses a pragmatic solution instead of an exactly correct one. -- /NAD

Duncan Coutts

23 Mar 23 Mar

11:17 a.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Thu, 2007-03-22 at 12:10 +0100, Nils Anders Danielsson wrote:

...

On Thu, 22 Mar 2007, Duncan Coutts wrote:

...
It's interesting to note that the implementations in base for some functions in Data.List have been inconsistent with the report for years and nobody has noticed (or at least if they noticed they didn't complain very loudly).

I have certainly found some discrepancies in the past. If you never reason about partial or infinite values you're not likely to stumble over these inconsistencies, though.

Certainly. I've come across some some very subtle examples that I'd not have spotted normally. I came across your ChasingBottoms lib yesterday. It's very nice. I'd started to implement something similar to test the strictness properties of our list reimplementation but what you've got is a good deal more complete. I started by looking at Olaf Chitil's StrictCheck module which uses your lib however the SYB technique it uses to generate partial values of various types doesn't help when it comes to function types. So instead I modified SmallCheck to generate values with bottoms in them. That copes with function types very nicely. It can generate all the partial monotone functions up to a given depth ( I think :-) ). My one gripe with ChasingBottoms is that I can't customise the ApproxShow instances. I want to show functions in the SmallCheck style but deal with _|_'s properly. SmallCheck defines a Show instance for functions that shows the mappings up to a certain depth. It's very cute. I can't define my own ApproxShow instance for functions since it overlaps with the Data a => ApproxShow a instance. Duncan

Nils Anders Danielsson

5:28 p.m.

New subject: [trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

On Fri, 23 Mar 2007, Duncan Coutts wrote:

...

My one gripe with ChasingBottoms is that I can't customise the ApproxShow instances. I want to show functions in the SmallCheck style but deal with _|_'s properly. SmallCheck defines a Show instance for functions that shows the mappings up to a certain depth. It's very cute. I can't define my own ApproxShow instance for functions since it overlaps with the Data a => ApproxShow a instance.

I don't work on ChasingBottoms any more, so feel free to update the library. -- /NAD

Lennart Augustsson

20 Mar 20 Mar

10:48 p.m.

I have certainly used Eq and Ord similar to the ones in my example. I know under what circumstances I can call what library functions, but the compiler doesn't. It should not make any unprovable assumptions about my code; I'd be very perturbed if it did. -- Lennart On Mar 19, 2007, at 23:05 , Duncan Coutts wrote:

...

On Mon, 2007-03-19 at 19:43 +0000, Lennart Augustsson wrote:

...
As I pointed out in the GHC trac report, this rule is wrong. E.g.,

data T = T Int Int deriving (Ord, Show) instance Eq T where T x _ == T y _ = x == y

ts = [T 1 1, T 1 undefined]

So the problem in that example is that if we nub first then we get [T 1 1] and sorting that is presumably fine, however if we sort we compare both fields.

My grumble here is that your Eq and Ord instances are not in agreement with each other. If you can claim that they're equal by only comparing the first components then surely you must for Ord too.

Isn't some law being violated here: T 1 0 < T 1 1 and yet T 1 0 == T 1 1

but shouldn't a < b imply that a != b

H98 requires that Eq and Ord be equality classes and total orders, does it say nothing about any relationship between the two, that they be in any way consistent?

So one could say that we can't really rely on any laws that should go along with type class instances, but that's clearly not right since then we cannot sort!

If we cannot rely on Ord then the current Data.List.sort implementation is wrong because it gives different answers from the H98 spec sort when the Ord instance is not a total order (this is easiest to see with sortBy).

Duncan

6693

Age (days ago)

6697

Last active (days ago)

List overview

Download

23 comments

8 participants

participants (8)

Bertram Felgenhauer
dons＠cse.unsw.edu.au
Duncan Coutts
Jan-Willem Maessen
John Meacham
Lennart Augustsson
Neil Mitchell
Nils Anders Danielsson