Generalize groupBy in a useful way?

I experienced some interesting behavior with groupBy tonight, and thought I'd run it by folks as a proposal for the Haskell libraries. Consider the predicate incp e1 e2 = e1 + 1 == e2 Naively, one might expect groupBy incp [2,3,4,5,6] == [[2,3,4,5,6]] but one actually gets [[2,3],[4,5],[6]] from the standard library implementation. The Haskell98 Report makes it clear that "When the 'By' function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence." Thus, the observed behavior is technically correct, or at least technically not incorrect. The incp predicate does not follow the equivalence laws (it is neither symmetric nor reflexive), so groupBy is probably permitted to return most any result of the correct type. What is actually happening is that the groupBy implementation is using span with a predicate (incp a) where a is the first element in the list. I think it would be better to expand the definition of of groupBy such that the equality test is applied only to adjacent elements (with the arguments in proper order). This would not change the behavior for true equality predicates, but would permit cases like incp to have sensible behavior, which can occasionally be useful. (It was to me in this case: I was identifying straights in poker hands.) Here's a suggested (untested) implementation. (See also http://hpaste.org/10134?#a0 ) It relies on a new "chain" function that is the binary equivalent of span. This function seems to me to be useful in its own right. chain :: (a -> a -> Bool) -> [a] -> ([a],[a]) chain _ [] = ([], []) chain _ [e] = ([e], []) chain p (e1 : es@(e2 : _)) | p e1 e2 = let (s1, s2) = chain p es in (e1 : s1, s2) | otherwise = ([e1], es) groupBy' _ [] = [] groupBy' p l = s1 : groupBy' p s2 where (s1, s2) = chain p l The implementation of groupBy' is actually a bit cleaned up compared to that of groupBy. Regardless of what is done here, it would be nice to expand the Haddock for groupBy in the standard library to make it clear how the predicate is used. Comments and suggestions are welcome. Am I way off base here? Bart Massey bart@cs.pdx.edu

On Fri, Sep 05, 2008 at 07:06:54AM +0000, Bart Massey wrote:
I think it would be better to expand the definition of groupBy such that the equality test is applied only to adjacent elements (with the arguments in proper order).
I agree: http://www.haskell.org/pipermail/haskell-cafe/2006-October/019148.html http://www.haskell.org/pipermail/libraries/2007-August/008031.html We should do it this time. (Probably too late for 6.10, though.)

On Fri, 2008-09-05 at 08:33 +0100, Ross Paterson wrote:
On Fri, Sep 05, 2008 at 07:06:54AM +0000, Bart Massey wrote:
I think it would be better to expand the definition of groupBy such that the equality test is applied only to adjacent elements (with the arguments in proper order).
I agree:
http://www.haskell.org/pipermail/haskell-cafe/2006-October/019148.html http://www.haskell.org/pipermail/libraries/2007-August/008031.html
We should do it this time. (Probably too late for 6.10, though.)
Yes, this behaviour surprised us when we were re-implementing the list library (as part of the stream fusion stuff). We assumed this was the behaviour and didn't catch it until we ran our quickcheck tests with randomly generated predicates. There's also a more efficient version that can be implemented if one is allowed to be slightly stricter and not return a group until the end of the group is encountered, but that's obviously not suitable for Data.List. (But it would be interesting if we could use RULES that could match on strictness properties.) Duncan

On 2008-09-05, Bart Massey
I experienced some interesting behavior with groupBy tonight, and thought I'd run it by folks as a proposal for the Haskell libraries.
Consider the predicate
incp e1 e2 = e1 + 1 == e2
Naively, one might expect
groupBy incp [2,3,4,5,6] == [[2,3,4,5,6]]
but one actually gets [[2,3],[4,5],[6]] from the standard library implementation.
The Haskell98 Report makes it clear that
"When the 'By' function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence."
Thus, the observed behavior is technically correct, or at least technically not incorrect. The incp predicate does not follow the equivalence laws (it is neither symmetric nor reflexive), so groupBy is probably permitted to return most any result of the correct type. What is actually happening is that the groupBy implementation is using span with a predicate (incp a) where a is the first element in the list.
If the comparison can be expensive on the first element, this allows the possibility of caching some of that computation by constructing a function that does the comparison. I admit this is unlikely... -- Aaron Denney -><-

Am Freitag, 5. September 2008 09:06 schrieb Bart Massey:
[…]
I think it would be better to expand the definition of of groupBy such that the equality test is applied only to adjacent elements (with the arguments in proper order). This would not change the behavior for true equality predicates, but would permit cases like incp to have sensible behavior, which can occasionally be useful. (It was to me in this case: I was identifying straights in poker hands.)
I think it would be better forcing the function to be an equivalence relation using dependent types. ;-) In the absence of dependent types, I think that noone should ever call groupBy with a relation which is not an equivalence. So for me, it really doesn’t matter what the function does if the precondition for calling it is not fulfilled.
[…]
Best wishes, Wolfgang
participants (5)
-
Aaron Denney
-
Bart Massey
-
Duncan Coutts
-
Ross Paterson
-
Wolfgang Jeltsch