groupBy without Ord?

Hello all, when I groupBy a list in order to cluster elements which satisfy some sort of equality, I usually have to sort the list first, which requires Ord. However groupBy itself does not require Ord, but just Eq. How can I groupBy a List whose elements are only instances of Eq but not of Ord?

On 22/03/14 16:51, martin wrote:
Hello all,
when I groupBy a list in order to cluster elements which satisfy some sort of equality, I usually have to sort the list first, which requires Ord. However groupBy itself does not require Ord, but just Eq.
How can I groupBy a List whose elements are only instances of Eq but not of Ord?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
group(By) is not concerned with grouping elements from the whole list together. It is only concerned about grouping elements that fulfill the equality that are already next to each other. | > group [1, 2, 1] | [[1], [2], [1]] If you want it to group the elements from the whole list, it as as you mention, you have to sort the list first or arrange it otherwise so that equal elements are next to each other. So to answer your question, you can groupBy Eq a => [a] simply by calling groupBy. If you want groupBy to group elements from the whole list, you need to arrange for such elements to be next to each other in the list. The Ord instance is normally used for this. Alternatively you can write a function which will simply go through the list for each distinctive element and collect the groups that way. This is _not_ what groupBy does: you're looking at the wrong function. -- Mateusz K.

On Sat, Mar 22, 2014 at 05:51:55PM +0100, martin wrote:
Hello all,
when I groupBy a list in order to cluster elements which satisfy some sort of equality, I usually have to sort the list first, which requires Ord. However groupBy itself does not require Ord, but just Eq.
How can I groupBy a List whose elements are only instances of Eq but not of Ord?
Well, you can just call groupBy on it. =) But I assume you mean that you still want to group together all equal elements. You could do something like groupEq [] = [] groupEq (a:rest) = (a:as) : groupEq bs where (as,bs) = partition (==a) rest and you could easily generalize it to take a binary predicate, like groupBy, as well. If you want to get a little fancier and avoid the explicit recursion, you can use Data.List.Split.chop (from the 'split' package), which provides a generic way of recursively processing a list: chop :: ([a] -> (b,[a])) -> [a] -> [b] like so: import Data.List.Split (chop) import Control.Arrow (first) groupEq = chop (\(a:rest) -> first (a:) (partition (==a) rest)) -Brent

On Sat, Mar 22, 2014 at 11:51 PM, martin
How can I groupBy a List whose elements are only instances of Eq but not of Ord?
If you take a look at the code for groupBy: groupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy _ [] = [] groupBy eq (x:xs) = (x:ys) : groupBy eq zs where (ys,zs) = span (eq x) xs and replace 'span' by 'partition' you'd get what you want. You'd need a new name for your different groupBy of course. -- Kim-Ee

Am 03/22/2014 06:40 PM, schrieb Kim-Ee Yeoh:
On Sat, Mar 22, 2014 at 11:51 PM, martin
mailto:martin.drautzburg@web.de> wrote: How can I groupBy a List whose elements are only instances of Eq but not of Ord?
If you take a look at the code for groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy _ [] = [] groupBy eq (x:xs) = (x:ys) : groupBy eq zs
Cool! I have a database background and the SQL "group by" does of course not assume any ordering. So I often wonder, where you would use Haskell's groupBy WITHOUT sorting first, but I assume there are situations, where this is useful.

On 22/03/14 18:29, martin wrote:
Am 03/22/2014 06:40 PM, schrieb Kim-Ee Yeoh:
On Sat, Mar 22, 2014 at 11:51 PM, martin
mailto:martin.drautzburg@web.de> wrote: How can I groupBy a List whose elements are only instances of Eq but not of Ord?
If you take a look at the code for groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy _ [] = [] groupBy eq (x:xs) = (x:ys) : groupBy eq zs
Cool!
I have a database background and the SQL "group by" does of course not assume any ordering. So I often wonder, where you would use Haskell's groupBy WITHOUT sorting first, but I assume there are situations, where this is useful.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
One example that comes to mind is filtering consecutive-same elements: | Prelude Data.List> map head $ group "hello world!!!" | "helo world!" -- Mateusz K.

On Sun, Mar 23, 2014 at 1:29 AM, martin
So I often wonder, where you would use Haskell's groupBy WITHOUT sorting first, but I assume there are situations, where this is useful.
But if you sorted first, then there must be an Ord-er on the elements, yes? In which case you could then apply (sqlGroup = group . sort) as originally desired. I'm sure groupBy is useful standalone (dim memories of some functional pearl) but your hunch is right in that I normally use it after a sorting step. -- Kim-Ee

I would keep in mind that, given the constraints
- you want to group together all equal elements from the entire list
- the only operation you can perform on elements is comparison for equality
your function will have quadratic complexity (at least I can't see any way
around it). If there's any way of sorting the elements, even if the order
is entirely arbitrary, you should consider it.
John L.
On Sat, Mar 22, 2014 at 9:51 AM, martin
Hello all,
when I groupBy a list in order to cluster elements which satisfy some sort of equality, I usually have to sort the list first, which requires Ord. However groupBy itself does not require Ord, but just Eq.
How can I groupBy a List whose elements are only instances of Eq but not of Ord?

Hi, Have you considered using a "dictionary" of key-value pairs to group the elements? HashTable requires only Eq and a hash function but no Ord. First insert all element into the hashtable and them iterate over the keys in the hashtable where you can find all elements per group. You could tests whether the (somewhat constant) overhead of the hashtable is significant in your use cases. On second thought, this might introduce an overloading dependency on the Hash class/function. Maybe you know something about the key type such that this dependency can be resolved elsewhere? kind regards, Arjen. On 03/24/2014 12:34 AM, John Lato wrote:
I would keep in mind that, given the constraints
- you want to group together all equal elements from the entire list - the only operation you can perform on elements is comparison for equality
your function will have quadratic complexity (at least I can't see any way around it). If there's any way of sorting the elements, even if the order is entirely arbitrary, you should consider it.
John L.
On Sat, Mar 22, 2014 at 9:51 AM, martin
mailto:martin.drautzburg@web.de> wrote: Hello all,
when I groupBy a list in order to cluster elements which satisfy some sort of equality, I usually have to sort the list first, which requires Ord. However groupBy itself does not require Ord, but just Eq.
How can I groupBy a List whose elements are only instances of Eq but not of Ord?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am 03/24/2014 12:34 AM, schrieb John Lato:
I would keep in mind that, given the constraints
- you want to group together all equal elements from the entire list - the only operation you can perform on elements is comparison for equality
your function will have quadratic complexity (at least I can't see any way around it).
Wouldn't a modified quicksort give n*log(n) complexity. Something like: groupEq [] = [] groupEq (x:xs) = (groupEq a) ++ [x] ++ (groupEq b) where a = filter (==x) xs b = filter (/= x) xs This is indeed much slower than the built-in sort, but it slower too whern I replace "==" with ">", resulting in a texbook quicksort.

On Sun, Mar 30, 2014 at 04:06:24PM +0200, martin wrote:
Am 03/24/2014 12:34 AM, schrieb John Lato:
I would keep in mind that, given the constraints
- you want to group together all equal elements from the entire list - the only operation you can perform on elements is comparison for equality
your function will have quadratic complexity (at least I can't see any way around it).
Wouldn't a modified quicksort give n*log(n) complexity. Something like:
Quicksort is O(n^2) in the worst case.

Martin might have meant merge sort... :) Flavio Sent from my iPhone
On Mar 30, 2014, at 10:12 AM, Tom Ellis
wrote: On Sun, Mar 30, 2014 at 04:06:24PM +0200, martin wrote: Am 03/24/2014 12:34 AM, schrieb John Lato:
I would keep in mind that, given the constraints
- you want to group together all equal elements from the entire list - the only operation you can perform on elements is comparison for equality
your function will have quadratic complexity (at least I can't see any way around it).
Wouldn't a modified quicksort give n*log(n) complexity. Something like:
Quicksort is O(n^2) in the worst case. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am 03/24/2014 12:34 AM, schrieb John Lato:> I would keep in mind that, given the constraints
- you want to group together all equal elements from the entire list - the only operation you can perform on elements is comparison for equality
your function will have quadratic complexity (at least I can't see any way around it). If there's any way of sorting the elements, even if the order is entirely arbitrary, you should consider it.
Am 03/30/2014 04:12 PM, schrieb Tom Ellis:
Quicksort is O(n^2) in the worst case.
I just took the sort function from Data.List and replaced all occurences of cmp x == GT by (==) and all others by (/=). I didn't even bother to understand the algorithm. What I got is a reordered list where all equal elements are aligned next to each other. I have no reason to assume that this aligning is more expensive than true sorting. Unlike my naive quicksort, it has no problem with millions of elements. But of course it makes little sense to implement an aligning function and then use the native groupBy, when you can easily roll your own fast groupEq as Brent Yorgey pointed out: groupEq :: Eq a => [a] -> [[a]] groupEq [] = [] groupEq (a:rest) = (a:as) : groupEq bs where (as,bs) = partition (==a) rest This one is even a bit faster than my modifed sort. So all in all there seems to be no benefit of sorting the list fist and asking for (Ord=>) when Eq is sufficient.

On Sun, Mar 30, 2014 at 10:50:54PM +0200, martin wrote:
I just took the sort function from Data.List and replaced all occurences of cmp x == GT by (==) and all others by (/=). I didn't even bother to understand the algorithm. What I got is a reordered list where all equal elements are aligned
Data.List.sort is a merge sort and the merge phase of this will not carry over to a correct 'groupBy'. Please check your implementation again!
But of course it makes little sense to implement an aligning function and then use the native groupBy, when you can easily roll your own fast groupEq as Brent Yorgey pointed out:
groupEq :: Eq a => [a] -> [[a]] groupEq [] = [] groupEq (a:rest) = (a:as) : groupEq bs where (as,bs) = partition (==a) rest
This is O(n^2). Tom

Am 03/30/2014 11:15 PM, schrieb Tom Ellis:
groupEq :: Eq a => [a] -> [[a]] groupEq [] = [] groupEq (a:rest) = (a:as) : groupEq bs where (as,bs) = partition (==a) rest
This is O(n^2).
I understood, that you suggested to go ahead and sort the list, instead of just aligning equal elements next to each other. So far all my attempts to prove you wrong failed. But I still have trouble to believe, that sorting (Ord=>) is cheaper than aligning (Eq=>) because sorting does aligning plus some more. Does (Ord=>) make life so much easier, or why do you think this is the case?

On Mon, Mar 31, 2014 at 11:27:47AM +0200, martin wrote:
Am 03/30/2014 11:15 PM, schrieb Tom Ellis:
groupEq :: Eq a => [a] -> [[a]] groupEq [] = [] groupEq (a:rest) = (a:as) : groupEq bs where (as,bs) = partition (==a) rest
This is O(n^2).
I understood, that you suggested to go ahead and sort the list, instead of just aligning equal elements next to each other. So far all my attempts to prove you wrong failed.
But I still have trouble to believe, that sorting (Ord=>) is cheaper than aligning (Eq=>) because sorting does aligning plus some more. Does (Ord=>) make life so much easier, or why do you think this is the case?
An Ord instance is a total order on your datatype and gives you extra properties to work. Seemingly these properties help a lot! However, I'm not sure I'd say sorting does "aligning plus some more". Depending on what you want to do, sorting might be seen as "aligning minus some". When sorting, the whole collection comes out in sorted order not just in grouped blocks. That is, the range of a sorting function is strictly smaller (exponentially smaller I guess) than the range of a grouping function. Thus sorting loses a lot of information. Tom

Hi, martin wrote:
But I still have trouble to believe, that sorting (Ord=>) is cheaper than aligning (Eq=>) because sorting does aligning plus some more. Does (Ord=>) make life so much easier [...]?
Yes, it does. For example, if you know that x < y and you know that y < z, you immediately also know that x < z without having to call (<) a third time. But if you know that x /= y and you know that y /= z, you don't know anything about whether x /= z or not. To learn this, you have to call (/=) a third time. Many sorting algorithms exploit this by cleverly comparing elements in such an order that no matter how they compare, we get to safe some of the calls to (<). This situation is typical in programming: A seemingly harder problem can be easier to solve than an seemingly easier variant, because partial solutions of the seemingly harder problem contain more information than partial solutions of the seemingly easier variant of the problem. In this case, the partial solutions are the results of x < y and y < z, and if they are both true, they contain enough information to also know x < z without computing anything. So if you cannot solve a problem, sometimes you have to make it harder in order to make it easier. Tillmann

On Mar 31, 2014 5:28 AM, "martin"
Am 03/30/2014 11:15 PM, schrieb Tom Ellis:
groupEq :: Eq a => [a] -> [[a]] groupEq [] = [] groupEq (a:rest) = (a:as) : groupEq bs where (as,bs) = partition (==a) rest
This is O(n^2).
I understood, that you suggested to go ahead and sort the list, instead
of just aligning equal elements next to each
other. So far all my attempts to prove you wrong failed.
None of your attempts have addressed the major criticism, I.e. your algorithm has higher complexity than Ord-based solutions. That doesn't mean your algorithm won't work well for your use case, but it does mean that it will work less well for some (many?) use cases. Have you tried "groupEq [1..1000000::Int]" ? Is the performance of that acceptable? Does that impact your use case?

Am 03/30/2014 06:00 PM, schrieb Kim-Ee Yeoh:
On Sun, Mar 30, 2014 at 9:06 PM, martin
mailto:martin.drautzburg@web.de> wrote: groupEq (x:xs) = (groupEq a) ++ [x] ++ (groupEq b) where a = filter (==x) xs b = filter (/= x) xs
Why the need to recurse on "a"? It's a list of identical elements!
Silly me. I had just mechanically modified the textbook quicksort.
participants (9)
-
Arjen
-
Brent Yorgey
-
Flavio Villanustre
-
John Lato
-
Kim-Ee Yeoh
-
martin
-
Mateusz Kowalczyk
-
Tillmann Rendel
-
Tom Ellis