nubBy, groupBy specification

Hello, I've recently been irritated with (what I consider to be) broken versions of nubBy in GHC 6.10.4 and 6.12.1. For a while, groupBy was broken in the same way, but it seems to be fixed at least in 6.10.4. The problem is roughly caused by nubBy being specified only for symmetric relations, where this is actually a rather silly constraint, given that it captures a very common pattern of sieving (that at least I'd personally not want to have to write recursively every time). My most common usage of nubBy is actually probably nubBy (>=), (which has to be changed to nubBy (<=) for recent GHCs), and variations on it. This is extremely useful as a kind of lazier approximation of maximum or minimum. I occasionally use nubBy with equivalence relations (after these, I'd say (==) `on` foo is probably my next most common use case), but it's surprising how often that's actually not quite the right thing. Of course, there's also the inefficient but pretty example of constructing the list of primes using the divisibility relation and nubBy, which getting this wrong also breaks. What do people think of the following spec? nubBy f xs produces a subsequence ys of xs which satisfies: 1) f x y is False whenever x occurs before y in ys. 2) ys is maximal with respect to containment subject to condition 1. 3) The sequence of indices of selected elements is lexicographically minimal subject to conditions 1 and 2. (That is, the selections are made greedily.) Similarly, I think groupBy f xs should be specified as: groupBy f xs produces a list of lists yss of contiguous elements of xs such that: 1) concat yss = xs 2) Each element of yss is a nonempty list (y:ys) such that all (f y) ys. 3) The sequence of lengths of elements of ys is lexicographically maximal subject to conditions 1 and 2. (Again, this just means it greedily prefers to put elements in earlier lists rather than later ones.) Note that the code given in the Haskell 98 spec does the right thing with respect to these specifications, but Haskell 98 says that they're not specified for non-equivalence relations, while I think that they actually should be. A general policy that I think we should try to adopt is that the expression graph produced by a higher order function on lists, in terms of the supplied functions, should as far as reasonably possible maintain the left-to-right order of occurrences of elements as they occurred in the list. This makes the precise behaviour easy to remember, and tends to keep expression graphs easy to draw. foldr, foldl, scanr, scanl, mapAccumL, nubBy and groupBy (as specified above), all follow this rule (of course, many simpler HOFs do as well). mapAccumR as specified strangely does not, which makes it another thing I'd like to change. (See: http://cale.yi.org/index.php/Fold_Diagrams) cheers! - Cale
participants (1)
-
Cale Gibbard