
I experienced some interesting behavior with groupBy tonight, and thought I'd run it by folks as a proposal for the Haskell libraries. Consider the predicate incp e1 e2 = e1 + 1 == e2 Naively, one might expect groupBy incp [2,3,4,5,6] == [[2,3,4,5,6]] but one actually gets [[2,3],[4,5],[6]] from the standard library implementation. The Haskell98 Report makes it clear that "When the 'By' function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence." Thus, the observed behavior is technically correct, or at least technically not incorrect. The incp predicate does not follow the equivalence laws (it is neither symmetric nor reflexive), so groupBy is probably permitted to return most any result of the correct type. What is actually happening is that the groupBy implementation is using span with a predicate (incp a) where a is the first element in the list. I think it would be better to expand the definition of of groupBy such that the equality test is applied only to adjacent elements (with the arguments in proper order). This would not change the behavior for true equality predicates, but would permit cases like incp to have sensible behavior, which can occasionally be useful. (It was to me in this case: I was identifying straights in poker hands.) Here's a suggested (untested) implementation. (See also http://hpaste.org/10134?#a0 ) It relies on a new "chain" function that is the binary equivalent of span. This function seems to me to be useful in its own right. chain :: (a -> a -> Bool) -> [a] -> ([a],[a]) chain _ [] = ([], []) chain _ [e] = ([e], []) chain p (e1 : es@(e2 : _)) | p e1 e2 = let (s1, s2) = chain p es in (e1 : s1, s2) | otherwise = ([e1], es) groupBy' _ [] = [] groupBy' p l = s1 : groupBy' p s2 where (s1, s2) = chain p l The implementation of groupBy' is actually a bit cleaned up compared to that of groupBy. Regardless of what is done here, it would be nice to expand the Haddock for groupBy in the standard library to make it clear how the predicate is used. Comments and suggestions are welcome. Am I way off base here? Bart Massey bart@cs.pdx.edu