Data.List.groupBy with non-transitive equality predicate

I was used to use groupBy to split a list before every element that satisfies a predicate. E.g. splitting before every capital Prelude> groupBy (\_ c -> not $ Char.isUpper c) "Hello World" ["Hello ","World"] I also wanted to use this for splitting after specific elements. But this fails. I get Prelude> groupBy (\c _ -> c /= ',') "1, 2, 3, 4" ["1, 2, 3, 4"] but I wanted ["1,", " 2,", " 3,", " 4"] I assumed that 'groupBy' would compare adjacent elements, but it seems to compare the leading element of each block with subsequent elements. Since the documentation doesn't tell which elements are actually compared it seems to assume that the comparison is commutative and transitive. Maybe this point should be made clearer. Additionally I think that comparing neighbouring elements is a useful behaviour and I suggest an according variant of 'groupBy' for Data.List. Opinions?

On 29 aug 2007, at 22.02, Henning Thielemann wrote:
Since the documentation doesn't tell which elements are actually compared it seems to assume that the comparison is commutative and transitive. Maybe this point should be made clearer.
I'd say this should be an implicit requirement for Eq a, just like Monad a assumes the monad laws (or strongly suggests). But I agree that could be made clearer.
Additionally I think that comparing neighbouring elements is a useful behaviour and I suggest an according variant of 'groupBy' for Data.List.
I agree that this could be useful. / Thomas

On Wed, 29 Aug 2007, Thomas Schilling wrote:
On 29 aug 2007, at 22.02, Henning Thielemann wrote:
Since the documentation doesn't tell which elements are actually compared it seems to assume that the comparison is commutative and transitive. Maybe this point should be made clearer.
I'd say this should be an implicit requirement for Eq a, just like Monad a assumes the monad laws (or strongly suggests). But I agree that could be made clearer.
'groupBy' (not 'group') has no Eq constraint.

On Wed, Aug 29, 2007 at 10:02:53PM +0200, Henning Thielemann wrote:
I assumed that 'groupBy' would compare adjacent elements, but it seems to compare the leading element of each block with subsequent elements. Since the documentation doesn't tell which elements are actually compared it seems to assume that the comparison is commutative and transitive. Maybe this point should be made clearer. Additionally I think that comparing neighbouring elements is a useful behaviour and I suggest an according variant of 'groupBy' for Data.List. Opinions?
This has come up a few times now. The Haskell 98 Report says that the argument function is assumed to define an equivalence. For such arguments, the definition given in the Report (comparing with the leading element) is equivalent to a definition comparing adjacent elements, like: groupBy rel [] = [] groupBy rel (x:xs) = (x:ys) : groupBy rel zs where (ys,zs) = groupByAux x xs groupByAux x0 (x:xs) | rel x0 x = (x:ys, zs) where (ys,zs) = groupByAux x xs groupByAux y xs = ([], xs) But the latter definition is also useful for other relations, e.g. groupBy (<=) would produce the list of "runs". So I'd suggest that changing the reference definition of groupBy, and removing the restriction, would be an improvement, but still compatible with Haskell 98. The same goes for nubBy.

On Wed, 2007-08-29 at 22:02 +0200, Henning Thielemann wrote:
I was used to use groupBy to split a list before every element that satisfies a predicate. E.g. splitting before every capital
Prelude> groupBy (\_ c -> not $ Char.isUpper c) "Hello World" ["Hello ","World"]
I also wanted to use this for splitting after specific elements. But this fails. I get
Prelude> groupBy (\c _ -> c /= ',') "1, 2, 3, 4" ["1, 2, 3, 4"]
but I wanted ["1,", " 2,", " 3,", " 4"]
I assumed that 'groupBy' would compare adjacent elements, but it seems to compare the leading element of each block with subsequent elements. Since the documentation doesn't tell which elements are actually compared it seems to assume that the comparison is commutative and transitive. Maybe this point should be made clearer. Additionally I think that comparing neighbouring elements is a useful behaviour and I suggest an according variant of 'groupBy' for Data.List. Opinions?
I noticed this interesting behaviour when implementing groupBy for lazy bytestrings. My QC tests showed I had the wrong answer for non-transative predicates. It was actually a lot harder to implement the H98 behaviour than the behaviour where we compare adjacent elements. So purely from an implementation point of view I'd be happy to change the behaviour to that which you suggest. Duncan

duncan.coutts:
On Wed, 2007-08-29 at 22:02 +0200, Henning Thielemann wrote:
I was used to use groupBy to split a list before every element that satisfies a predicate. E.g. splitting before every capital
Prelude> groupBy (\_ c -> not $ Char.isUpper c) "Hello World" ["Hello ","World"]
I also wanted to use this for splitting after specific elements. But this fails. I get
Prelude> groupBy (\c _ -> c /= ',') "1, 2, 3, 4" ["1, 2, 3, 4"]
but I wanted ["1,", " 2,", " 3,", " 4"]
I assumed that 'groupBy' would compare adjacent elements, but it seems to compare the leading element of each block with subsequent elements. Since the documentation doesn't tell which elements are actually compared it seems to assume that the comparison is commutative and transitive. Maybe this point should be made clearer. Additionally I think that comparing neighbouring elements is a useful behaviour and I suggest an according variant of 'groupBy' for Data.List. Opinions?
I noticed this interesting behaviour when implementing groupBy for lazy bytestrings. My QC tests showed I had the wrong answer for non-transative predicates. It was actually a lot harder to implement the H98 behaviour than the behaviour where we compare adjacent elements. So purely from an implementation point of view I'd be happy to change the behaviour to that which you suggest.
I'd second this. groupBy's exact behaviour was really fiddly to nail down in QC. H' should come with QuickCheck properties relating list functions to each other. -- Don
participants (5)
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
Henning Thielemann
-
Ross Paterson
-
Thomas Schilling