It might be nice to have "nubBy" work in a way that is more intuitive to computer scientists who expect list evaluation to work in a specific order. Unfortunately, Haskell is quite explicit about not specifying the order of evaluation, which can make Haskell more intuitive for mathematicians, and less intuitive for most other people.

I don't work on GHC or on the Haskell language committee, but my understanding is that making the "nubBy" function undefined for operations that do not test for equality is a simplifying assumption that allows more freedom for evaluation and optimization. Here is an overly-simple example, but I hope it makes sense:

a = nubBy (==) ([10-5] ++ takeWhile (<5) [0..20])

b = nubBy (==) (nubBy (==) [5] ++ takeWhile (<5) (nubBy (==) [0..20]))

According to Haskell, both 'a' and 'b' are mathematically equivalent, because "nubBy" is a distributive and associative function. This implies that if the compiler can somehow produce more efficient code by first converting 'a' into 'b' and then applying optimization, it should have the freedom to do so, and laziness guarantees that freedom. This is a poor example because obviously 'b' couldn't possibly be easier to optimize than 'a'. But really, who can fathom the logic of those crazy programmers who implemented the compiler with their ridiculous (but somehow always optimal) optimization strategies?

If you require interpretation to go by list order, then you also must eliminate the distributive and associative properties of the "nubBy" function. By declaring "nubBy" only work on equality operations, you guarantee that it is associative and distributive across lists, and this allows a host of optimization strategies to be used which would otherwise be impossible if list-order application were required.

If list order is important for you, it is easy enough to define your own "nubBy" function that is not distributive or associative, and can be therefore optimized differently than when you use "Data.List.nubBy".

This blog post: <http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html> is about C, but the principles are the same: it has a fantastic explanation about how "undefined behavior" can be really helpful with simplifying compiler implementation and optimization.

I hope that makes sense, and if I said anything inaccurate, I am at the mercy of the Haskell-prime mailing list.

On Thu, Sep 8, 2011 at 9:07 AM, Cale Gibbard <cgibbard@gmail.com> wrote:

I just tried this in ghci-7.0.3:

ghci> nubBy (>=) [1,2,3,4]
[1]

Think about what this is doing: it is excluding 2 from the list
because 2 >= 1, rather than including it because 1 >= 2 fails.

I think an important convention when it comes to higher order
functions on lists is that to the extent which is possible, the
function parameters take elements from the list (or things computed
from those) in the order in which they occur in the original list.

If we reimplement it in the obvious way:
ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f xs)
ghci> nubBy (>=) [1,2,3,4]
[1,2,3,4]

I'm aware that the Report (strangely!) explicitly leaves the behaviour
of nubBy unspecified for functions which are not equivalence
relations, but the behaviour given by the Report implementation (the
opposite of the current behaviour in GHC) is useful and desirable
nonetheless.

I'm sure I've written about this before. I'm not entirely sure what
happened to the previous thread of discussion about this, but it just
came up again for me, and I decided that I was sufficiently irritated
by it to post again.

Another thing perhaps worth pointing out is that the parameters to
mapAccumR have always been backwards (compare it with foldr). Few
enough people use this function that I'm fairly sure we could just
change it without harm.

- Cale

_______________________________________________
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime