Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

older
Re: Please apply the comparison...

Cale Gibbard

8 Sep 2011 8 Sep '11

12:07 a.m.

I just tried this in ghci-7.0.3: ghci> nubBy (>=) [1,2,3,4] [1] Think about what this is doing: it is excluding 2 from the list because 2 >= 1, rather than including it because 1 >= 2 fails. I think an important convention when it comes to higher order functions on lists is that to the extent which is possible, the function parameters take elements from the list (or things computed from those) in the order in which they occur in the original list. If we reimplement it in the obvious way: ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs) ghci> nubBy (>=) [1,2,3,4] [1,2,3,4] I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations, but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless. I'm sure I've written about this before. I'm not entirely sure what happened to the previous thread of discussion about this, but it just came up again for me, and I decided that I was sufficiently irritated by it to post again. Another thing perhaps worth pointing out is that the parameters to mapAccumR have always been backwards (compare it with foldr). Few enough people use this function that I'm fairly sure we could just change it without harm. - Cale

Show replies by date

Ramin Honary

8 Sep 8 Sep

8:03 a.m.

New subject: Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

Greetings, It might be nice to have "nubBy" work in a way that is more intuitive to computer scientists who expect list evaluation to work in a specific order. Unfortunately, Haskell is quite explicit about not specifying the order of evaluation, which can make Haskell more intuitive for mathematicians, and less intuitive for most other people. I don't work on GHC or on the Haskell language committee, but my understanding is that making the "nubBy" function undefined for operations that do not test for equality is a simplifying assumption that allows more freedom for evaluation and optimization. Here is an overly-simple example, but I hope it makes sense: a = nubBy (==) ([10-5] ++ takeWhile (<5) [0..20]) b = nubBy (==) (nubBy (==) [5] ++ takeWhile (<5) (nubBy (==) [0..20])) According to Haskell, both 'a' and 'b' are mathematically equivalent, because "nubBy" is a distributive and associative function. This implies that if the compiler can somehow produce more efficient code by first converting 'a' into 'b' and then applying optimization, it should have the freedom to do so, and laziness guarantees that freedom. This is a poor example because obviously 'b' couldn't possibly be easier to optimize than 'a'. But really, who can fathom the logic of those crazy programmers who implemented the compiler with their ridiculous (but somehow always optimal) optimization strategies? If you require interpretation to go by list order, then you also must eliminate the distributive and associative properties of the "nubBy" function. By declaring "nubBy" only work on equality operations, you guarantee that it is associative and distributive across lists, and this allows a host of optimization strategies to be used which would otherwise be impossible if list-order application were required. If list order is important for you, it is easy enough to define your own "nubBy" function that is not distributive or associative, and can be therefore optimized differently than when you use "Data.List.nubBy". This blog post: < http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html> is about C, but the principles are the same: it has a fantastic explanation about how "undefined behavior" can be really helpful with simplifying compiler implementation and optimization. I hope that makes sense, and if I said anything inaccurate, I am at the mercy of the Haskell-prime mailing list. On Thu, Sep 8, 2011 at 9:07 AM, Cale Gibbard wrote:

...

I just tried this in ghci-7.0.3:

ghci> nubBy (>=) [1,2,3,4] [1]

Think about what this is doing: it is excluding 2 from the list because 2 >= 1, rather than including it because 1 >= 2 fails.

I think an important convention when it comes to higher order functions on lists is that to the extent which is possible, the function parameters take elements from the list (or things computed from those) in the order in which they occur in the original list.

If we reimplement it in the obvious way: ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs) ghci> nubBy (>=) [1,2,3,4] [1,2,3,4]

I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations, but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless.

I'm sure I've written about this before. I'm not entirely sure what happened to the previous thread of discussion about this, but it just came up again for me, and I decided that I was sufficiently irritated by it to post again.

Another thing perhaps worth pointing out is that the parameters to mapAccumR have always been backwards (compare it with foldr). Few enough people use this function that I'm fairly sure we could just change it without harm.

- Cale

_______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime

Cale Gibbard

20 Sep 20 Sep

7:19 a.m.

New subject: Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

Except that I didn't really say anything about evaluation order, I said something about the semantics of the higher order function, and what I wanted those semantics to be. Now, of course, the Report definition leaves it completely ambiguous what set of tests nubBy will perform when applied to some equivalence relation, and there's a fair amount of wiggle room in how it gets implemented because of that. Maybe some implementations are a little more efficient than others, or perform fewer tests. But that's beside the point, because I want to apply nubBy to non-equivalence relations and have it do something sensible, because the Report sample implementation of nubBy is actually *useful* when applied to more general relations. Maybe not as frequently as something like foldr, but I have a case where I want to apply it to something which isn't an equivalence relation come up maybe every couple of months, and it's really frustrating that it's undefined. It's similar to only having a 'fold' function which is formally undefined for non-associative operators, and not being able to rely on the association that it gives you. Sure, there might be cases where that actually helps a lot with performance (it's certainly crucial for parallelism), but then there will also be cases where that is totally inappropriate and ensures that you can't really use the function for your task. The one which is in GHC right now happens to be exactly what I want every time that the need arises, except that in the current version, it flips the order of arguments to the predicate, which seems to indicate that it should not hurt very badly to insert a 'flip' into its definition. An implementation of nubBy which works for an arbitrary equivalence relation and doesn't require any additional assumptions is going to be quadratic in any case. You're not going to do better than that, because in the worst case (a list with all elements non-equivalent) you have to examine every unordered pair of elements of the list. So you're at most talking about a constant factor or special case performance gain, at the cost of not specifying semantics when it would be useful to be able to rely on them. On 8 September 2011 04:03, Ramin Honary wrote:

...

Greetings, It might be nice to have "nubBy" work in a way that is more intuitive to computer scientists who expect list evaluation to work in a specific order. Unfortunately, Haskell is quite explicit about not specifying the order of evaluation, which can make Haskell more intuitive for mathematicians, and less intuitive for most other people. I don't work on GHC or on the Haskell language committee, but my understanding is that making the "nubBy" function undefined for operations that do not test for equality is a simplifying assumption that allows more freedom for evaluation and optimization. Here is an overly-simple example, but I hope it makes sense: a = nubBy (==) ([10-5] ++ takeWhile (<5) [0..20]) b = nubBy (==) (nubBy (==) [5] ++ takeWhile (<5) (nubBy (==) [0..20])) According to Haskell, both 'a' and 'b' are mathematically equivalent, because "nubBy" is a distributive and associative function. This implies that if the compiler can somehow produce more efficient code by first converting 'a' into 'b' and then applying optimization, it should have the freedom to do so, and laziness guarantees that freedom. This is a poor example because obviously 'b' couldn't possibly be easier to optimize than 'a'. But really, who can fathom the logic of those crazy programmers who implemented the compiler with their ridiculous (but somehow always optimal) optimization strategies? If you require interpretation to go by list order, then you also must eliminate the distributive and associative properties of the "nubBy" function. By declaring "nubBy" only work on equality operations, you guarantee that it is associative and distributive across lists, and this allows a host of optimization strategies to be used which would otherwise be impossible if list-order application were required. If list order is important for you, it is easy enough to define your own "nubBy" function that is not distributive or associative, and can be therefore optimized differently than when you use "Data.List.nubBy". This blog post: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html is about C, but the principles are the same: it has a fantastic explanation about how "undefined behavior" can be really helpful with simplifying compiler implementation and optimization. I hope that makes sense, and if I said anything inaccurate, I am at the mercy of the Haskell-prime mailing list.

On Thu, Sep 8, 2011 at 9:07 AM, Cale Gibbard wrote:

...
I just tried this in ghci-7.0.3:

ghci> nubBy (>=) [1,2,3,4] [1]

Think about what this is doing: it is excluding 2 from the list because 2 >= 1, rather than including it because 1 >= 2 fails.

I think an important convention when it comes to higher order functions on lists is that to the extent which is possible, the function parameters take elements from the list (or things computed from those) in the order in which they occur in the original list.

If we reimplement it in the obvious way: ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs) ghci> nubBy (>=) [1,2,3,4] [1,2,3,4]

I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations, but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless.

I'm sure I've written about this before. I'm not entirely sure what happened to the previous thread of discussion about this, but it just came up again for me, and I decided that I was sufficiently irritated by it to post again.

Another thing perhaps worth pointing out is that the parameters to mapAccumR have always been backwards (compare it with foldr). Few enough people use this function that I'm fairly sure we could just change it without harm.

- Cale

_______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime

Christian Maeder

10:59 a.m.

New subject: Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

Looking at the code of nubBy http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.4.0.0/src/Data-... nubBy :: (a -> a -> Bool) -> [a] -> [a] #ifdef USE_REPORT_PRELUDE nubBy eq [] = [] nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs) #else nubBy eq l = nubBy' l [] where nubBy' [] _ = [] nubBy' (y:ys) xs | elem_by eq y xs = nubBy' ys xs | otherwise = y : nubBy' ys (y:xs) -- Not exported: -- Note that we keep the call to `eq` with arguments in the -- same order as in the reference implementation -- 'xs' is the list of things we've seen so far, -- 'y' is the potential new element elem_by :: (a -> a -> Bool) -> a -> [a] -> Bool elem_by _ _ [] = False elem_by eq y (x:xs) = y `eq` x || elem_by eq y xs #endif I see that the USE_REPORT_PRELUDE version corresponds to your proposal, but the actual implementation (based on elem_by) behaves differently despite the "same order" comment! Therefore I support your proposal to change "y `eq` x" in elem_by (and possibly improve the documentation). Cheers Christian Am 08.09.2011 02:07, schrieb Cale Gibbard:

...

I just tried this in ghci-7.0.3:

ghci> nubBy (>=) [1,2,3,4] [1]

Think about what this is doing: it is excluding 2 from the list because 2>= 1, rather than including it because 1>= 2 fails.

I think an important convention when it comes to higher order functions on lists is that to the extent which is possible, the function parameters take elements from the list (or things computed from those) in the order in which they occur in the original list.

If we reimplement it in the obvious way: ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs) ghci> nubBy (>=) [1,2,3,4] [1,2,3,4]

I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations, but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless.

I'm sure I've written about this before. I'm not entirely sure what happened to the previous thread of discussion about this, but it just came up again for me, and I decided that I was sufficiently irritated by it to post again.

Another thing perhaps worth pointing out is that the parameters to mapAccumR have always been backwards (compare it with foldr). Few enough people use this function that I'm fairly sure we could just change it without harm.

- Cale

Christian Maeder

11:46 a.m.

New subject: Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

Looking at the old tickets http://hackage.haskell.org/trac/ghc/ticket/2528 http://hackage.haskell.org/trac/ghc/ticket/3280 the USE_REPORT_PRELUDE version of nub is also different from the implementation, but both variants fulfill "nub = nubBy (==)" (the prelude version by definition). So when we change the nubBy implmentation we also need to change the nub implementation (which is more difficult, because it uses "elem"). Cheers Christian Am 20.09.2011 12:59, schrieb Christian Maeder:

...

Looking at the code of nubBy http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.4.0.0/src/Data-...

nubBy :: (a -> a -> Bool) -> [a] -> [a] #ifdef USE_REPORT_PRELUDE nubBy eq [] = [] nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs) #else nubBy eq l = nubBy' l [] where nubBy' [] _ = [] nubBy' (y:ys) xs | elem_by eq y xs = nubBy' ys xs | otherwise = y : nubBy' ys (y:xs)

-- Not exported: -- Note that we keep the call to `eq` with arguments in the -- same order as in the reference implementation -- 'xs' is the list of things we've seen so far, -- 'y' is the potential new element elem_by :: (a -> a -> Bool) -> a -> [a] -> Bool elem_by _ _ [] = False elem_by eq y (x:xs) = y `eq` x || elem_by eq y xs #endif

I see that the USE_REPORT_PRELUDE version corresponds to your proposal, but the actual implementation (based on elem_by) behaves differently despite the "same order" comment!

Therefore I support your proposal to change "y `eq` x" in elem_by (and possibly improve the documentation).

Cheers Christian

Am 08.09.2011 02:07, schrieb Cale Gibbard:

...
I just tried this in ghci-7.0.3:

ghci> nubBy (>=) [1,2,3,4] [1]

Think about what this is doing: it is excluding 2 from the list because 2>= 1, rather than including it because 1>= 2 fails.

I think an important convention when it comes to higher order functions on lists is that to the extent which is possible, the function parameters take elements from the list (or things computed from those) in the order in which they occur in the original list.

If we reimplement it in the obvious way: ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs) ghci> nubBy (>=) [1,2,3,4] [1,2,3,4]

I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations, but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless.

I'm sure I've written about this before. I'm not entirely sure what happened to the previous thread of discussion about this, but it just came up again for me, and I decided that I was sufficiently irritated by it to post again.

Another thing perhaps worth pointing out is that the parameters to mapAccumR have always been backwards (compare it with foldr). Few enough people use this function that I'm fairly sure we could just change it without harm.

- Cale

Christian Maeder

12:19 p.m.

New subject: Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

Apologies for responding to myself, but the difference between the REPORT_PRELUDE and the ghc implementation also applies to elem and notElem. #ifdef USE_REPORT_PRELUDE elem x = any (== x) notElem x = all (/= x) #else elem _ [] = False elem x (y:ys) = x==y || elem x ys notElem _ [] = True notElem x (y:ys)= x /= y && notElem x ys #endif So the proposal should be to swap the arguments in "x==y" and "x /= y" (above) which would also fix the nub implementation! C. Am 20.09.2011 13:46, schrieb Christian Maeder:

...

Looking at the old tickets http://hackage.haskell.org/trac/ghc/ticket/2528 http://hackage.haskell.org/trac/ghc/ticket/3280

the USE_REPORT_PRELUDE version of nub is also different from the implementation, but both variants fulfill "nub = nubBy (==)" (the prelude version by definition).

So when we change the nubBy implmentation we also need to change the nub implementation (which is more difficult, because it uses "elem").

Cheers Christian

Am 20.09.2011 12:59, schrieb Christian Maeder:

...
Looking at the code of nubBy http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.4.0.0/src/Data-...

nubBy :: (a -> a -> Bool) -> [a] -> [a] #ifdef USE_REPORT_PRELUDE nubBy eq [] = [] nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs) #else nubBy eq l = nubBy' l [] where nubBy' [] _ = [] nubBy' (y:ys) xs | elem_by eq y xs = nubBy' ys xs | otherwise = y : nubBy' ys (y:xs)

-- Not exported: -- Note that we keep the call to `eq` with arguments in the -- same order as in the reference implementation -- 'xs' is the list of things we've seen so far, -- 'y' is the potential new element elem_by :: (a -> a -> Bool) -> a -> [a] -> Bool elem_by _ _ [] = False elem_by eq y (x:xs) = y `eq` x || elem_by eq y xs #endif

I see that the USE_REPORT_PRELUDE version corresponds to your proposal, but the actual implementation (based on elem_by) behaves differently despite the "same order" comment!

Therefore I support your proposal to change "y `eq` x" in elem_by (and possibly improve the documentation).

Cheers Christian

Am 08.09.2011 02:07, schrieb Cale Gibbard:

...
I just tried this in ghci-7.0.3:

ghci> nubBy (>=) [1,2,3,4] [1]

Think about what this is doing: it is excluding 2 from the list because 2>= 1, rather than including it because 1>= 2 fails.

I think an important convention when it comes to higher order functions on lists is that to the extent which is possible, the function parameters take elements from the list (or things computed from those) in the order in which they occur in the original list.

If we reimplement it in the obvious way: ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs) ghci> nubBy (>=) [1,2,3,4] [1,2,3,4]

I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations, but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless.

I'm sure I've written about this before. I'm not entirely sure what happened to the previous thread of discussion about this, but it just came up again for me, and I decided that I was sufficiently irritated by it to post again.

Another thing perhaps worth pointing out is that the parameters to mapAccumR have always been backwards (compare it with foldr). Few enough people use this function that I'm fairly sure we could just change it without harm.

- Cale

Malcolm Wallace

11:32 a.m.

If this is a _proposal_ to change ghc's non-Report-compatible Data.List implementation to match the behaviour of the Report implementation, then count me as a +1.

...

I think an important convention when it comes to higher order functions on lists is that to the extent which is possible, the function parameters take elements from the list (or things computed from those) in the order in which they occur in the original list.

This seems like an entirely reasonable principle.

...

I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations, but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless.

I notice that the Haskell'98 Report gives a sample implementation, but the Haskell'2010 Report does not. I wonder if this is a regression, since in days of yore, the Report implementation was treated as a gold standard reference for questions about function semantics, strictness and so forth. Regards, Malcolm

Daniel Fischer

1:26 p.m.

New subject: Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

On Thursday 08 September 2011, 02:07:39, Cale Gibbard wrote:

...

If we reimplement it in the obvious way: ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs) ghci> nubBy (>=) [1,2,3,4] [1,2,3,4]

I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations,

I find that not so strange, another obvious reimplementation is the one you get if you compile Data.List with -DUSE_REPORT_PRELUDE, nubBy eq [] = [] nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs) If the relation defined by the predicate argument is not transitive, you get a different result than with your definition above. Of course, the report could make a choice and say that behaviour is authoritative, but while the expected behaviour is clear for an equivalence relation (if you don't distinguish different bottoms), it's not so obvious otherwise.

...

but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless.

Yes, although I don't think nubBy is a perfect name for that function - and, it's also only useful for transitive relations, as far as I can see.

...

I'm sure I've written about this before. I'm not entirely sure what happened to the previous thread of discussion about this, but it just

Nobody made a formal proposal, so it withered and died.

...

came up again for me, and I decided that I was sufficiently irritated by it to post again.

Sufficiently irritated to make a formal proposal? If nobody does that, it probably won't change (and I'm not interested enough in the matter to make one, but would support).

...

Another thing perhaps worth pointing out is that the parameters to mapAccumR have always been backwards (compare it with foldr). Few enough people use this function that I'm fairly sure we could just change it without harm.

Supposing that the last assessment is not wildly incorrect, +1.

Christian Maeder

28 Sep 28 Sep

10:02 a.m.

New subject: Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

In case you further want to discuss this, I've re-opened http://hackage.haskell.org/trac/ghc/ticket/2528#comment:10 So, I'm against your proposal, Cale, but suggest that you revert the order in your example (if you want to exploit this behavior). Cheers Christian Am 08.09.2011 02:07, schrieb Cale Gibbard:

...

I just tried this in ghci-7.0.3:

ghci> nubBy (>=) [1,2,3,4] [1]

Think about what this is doing: it is excluding 2 from the list because 2>= 1, rather than including it because 1>= 2 fails.

I think an important convention when it comes to higher order functions on lists is that to the extent which is possible, the function parameters take elements from the list (or things computed from those) in the order in which they occur in the original list.

If we reimplement it in the obvious way: ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs) ghci> nubBy (>=) [1,2,3,4] [1,2,3,4]

I'm aware that the Report (strangely!) explicitly leaves the behaviour of nubBy unspecified for functions which are not equivalence relations, but the behaviour given by the Report implementation (the opposite of the current behaviour in GHC) is useful and desirable nonetheless.

I'm sure I've written about this before. I'm not entirely sure what happened to the previous thread of discussion about this, but it just came up again for me, and I decided that I was sufficiently irritated by it to post again.

Another thing perhaps worth pointing out is that the parameters to mapAccumR have always been backwards (compare it with foldr). Few enough people use this function that I'm fairly sure we could just change it without harm.

- Cale

5037

Age (days ago)

5057

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

Cale Gibbard
Christian Maeder
Daniel Fischer
Malcolm Wallace
Ramin Honary