RE: fix for Data.List.sortBy

Sid,
I'd be delighted to submit the patch, as long as I have permission (which I
probably don't), you feel confident about the change and maybe a couple of
other people agree.
Here is the proposed change. Tests shows significant speed improvement (30%)
when sorting lists of random numbers, and same efficiency for sorting
already sorted lists (both ascending and descending).
Thanks,
greg
_____
From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On Behalf Of
Siddhanathan Shanmugam
Sent: Monday, March 27, 2017 6:53 PM
To: Gregory Popovitch
Subject: RE: Proposal: a new implementation for Data.List.sort and
Data.List.sortBy, which has better performance characteristics and is more
laziness-friendly.
Since I don't see any regressions, this doesn't really need CLC approval.
The changes are also small enough that a Github PR may be accepted
(otherwise, the change goes in via Phabricator).
Are you interested in implementing this patch? If yes, a standard Github PR
should be fine. Right now gSort is a three line change I think. It will be
changed in ghc/libraries/base/Data/OldList.hs on the ghc/ghc repo on Github.
I'm hoping for some more comments from other Haskellers, before pushing for
this change in base. I feel like we may be missing a potential optimization
that someone else might spot. So probably going to wait a few days.
On Mar 27, 2017 11:43 AM, "Gregory Popovitch"
However, still my version is more laziness-friendly, i.e. it requires fewer
comparisons to get the N smallest elements of a list (see
https://github.com/greg7mdp/ https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs ghc-sort/blob/master/src/sort_with_trace.hs).
I wonder if this might not be a more useful trait than being able to sort
already sorted lists super fast.
This comes down to a discussion of merge sort vs natural merge sort.
Data.List.sort is an implementation of a variant of merge sort called
natural merge sort. The algorithm is linearithmic in the worst case, but
linear in the best case (already sorted list).
On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch

The first seq is useless: constructor application is never suspended. I
haven't had a chance to look at the rest yet.
On Mar 27, 2017 7:59 PM, "Gregory Popovitch"
Sid,
I'd be delighted to submit the patch, as long as I have permission (which I probably don't), you feel confident about the change and maybe a couple of other people agree.
Here is the proposed change. Tests shows significant speed improvement (30%) when sorting lists of random numbers, and same efficiency for sorting already sorted lists (both ascending and descending).
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 6:53 PM *To:* Gregory Popovitch *Subject:* RE: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Since I don't see any regressions, this doesn't really need CLC approval. The changes are also small enough that a Github PR may be accepted (otherwise, the change goes in via Phabricator).
Are you interested in implementing this patch? If yes, a standard Github PR should be fine. Right now gSort is a three line change I think. It will be changed in ghc/libraries/base/Data/OldList.hs on the ghc/ghc repo on Github.
I'm hoping for some more comments from other Haskellers, before pushing for this change in base. I feel like we may be missing a potential optimization that someone else might spot. So probably going to wait a few days.
On Mar 27, 2017 11:43 AM, "Gregory Popovitch"
wrote: Hi Sid,
Thanks, glad you looked into that. My understanding of the Haskell execution model is really poor, so I can't say one way or the other, but I felt that laziness ought to be considered as well, and I'm glad it was :-)
So in conclusion it looks like we have a winner with your latest gSortBy version. How do we get this pushed to the GHC library?
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 2:12 PM *To:* Gregory Popovitch
*Subject:* Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Hi Greg,
On Mon, Mar 27, 2017 at 10:19 AM, Gregory Popovitch
wrote: Unfortunately, this optimization makes the sort less lazy, so doing something like:
take 4 $ sort l
requires more sorting of the list l with this change. I'm not sure it is a good tradeoff.
This can be verified with: https://github.com/greg7mdp/gh c-sort/blob/master/src/sort_with_trace.hs
I think you're running without optimizations turned on. It is lazy in my case.
Also, the difference should be negligible (if any at all). Here's an example of the list being sorted:
[11,4,6,8,2,5,1,7,9,55,11,3] ... [[4,11],[6,8],[2,5],[1,7,9,55],[3,11],[]] ... [[1,2,4,5,6,7,8,9,11,55],[3,11]] * 1 3 * 2 3 * 4 3 * 4 11 [1,2,3,4]
The number of operations saved is only in the last merge. It's only lazy at this step.
So we save at most one traversal of the list, which is not too expensive since our worst case bounds is O(n log n) anyway.
This should mean that the asymptotic performance should be identical, regardless of the number of comparisons saved. Of course, you do get better constants, but I would be surprised if those constants translated to significantly better performance for a reasonable size list.
I do agree that it would be nice to have a more serious validation test suite.
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 12:53 PM
*To:* Gregory Popovitch *Cc:* Haskell Libraries *Subject:* Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
We can improve things a bit further by forcing evaluation (with seq) along the way appropriately.
gregSortBy cmp [] = [] gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs) where pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t | otherwise = [x, y] : pair t pair [x] = [[x]] pair [] = []
reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t where v' = merge v w `seq` merge v w x' = merge x y `seq` merge x y
reduce (x:y:t) = merge x y `seq` merge x y : reduce t reduce xs = xs
merge xs [] = xs merge [] ys = ys merge xs@(x:xs') ys@(y:ys') | x `cmp` y == GT = y : merge xs ys' | otherwise = x : merge xs' ys
gSortBy cmp = mergeAll . sequences where sequences (a:b:xs) | a `cmp` b == GT = descending b [a] xs | otherwise = ascending b (a:) xs sequences xs = [xs]
descending a as (b:bs) | a `cmp` b == GT = descending b (a:as) bs descending a as bs = (a:as) `seq` (a:as) : sequences bs
ascending a as (b:bs) | a `cmp` b /= GT = ascending b (as . (a:)) bs ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs mergePairs xs = xs
merge as@(a:as') bs@(b:bs') | a `cmp` b == GT = b : merge as bs' | otherwise = a : merge as' bs merge [] bs = bs merge as [] = as
*Before the change:*
benchmarking random ints/ghc time 3.687 s (3.541 s .. NaN s) 1.000 R² (1.000 R² .. 1.000 R²) mean 3.691 s (3.669 s .. 3.705 s) std dev 21.45 ms (0.0 s .. 24.76 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg time 2.648 s (2.482 s .. 2.822 s) 0.999 R² (0.998 R² .. 1.000 R²) mean 2.704 s (2.670 s .. 2.736 s) std dev 52.68 ms (0.0 s .. 54.49 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort time 2.733 s (2.682 s .. 2.758 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.707 s (2.689 s .. 2.718 s) std dev 16.84 ms (0.0 s .. 19.20 ms) variance introduced by outliers: 19% (moderately inflated)
*After the change:*
benchmarking random ints/greg time 2.576 s (2.548 s .. 2.628 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.590 s (2.578 s .. 2.599 s) std dev 12.99 ms (0.0 s .. 14.89 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort time 2.538 s (2.412 s .. 2.627 s) 1.000 R² (0.999 R² .. 1.000 R²) mean 2.543 s (2.517 s .. 2.560 s) std dev 26.16 ms (0.0 s .. 30.21 ms) variance introduced by outliers: 19% (moderately inflated)
On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam < siddhanathan+eml@gmail.com> wrote:
Theoretically, we could do better. We currently only exploit monotonic runs in merge sort, but we could also exploit bitonic runs:
dlist as = as [] `seq` as []
sequences [] = [[]] sequences [a] = [[a]] sequences (a:xs) = bitonic a a (a:) xs
bitonic min max as (b:bs) | b `cmp` max /= LT = bitonic min b (as . (b:)) bs | b `cmp` min /= GT = bitonic b max ((b:) . as) bs | otherwise = dlist as : sequences (b:bs) bitonic _ _ as [] = [dlist as]
The constant factors here might be too high to notice the difference though.
However, still my version is more laziness-friendly, i.e. it requires fewer comparisons to get the N smallest elements of a list (see https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_ with_trace.hs).
I wonder if this might not be a more useful trait than being able to sort already sorted lists super fast.
This comes down to a discussion of merge sort vs natural merge sort.
Data.List.sort is an implementation of a variant of merge sort called natural merge sort. The algorithm is linearithmic in the worst case, but linear in the best case (already sorted list).
On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch
wrote: Thanks again @Siddhanathan! Looks like your gSort fixes the main issue with Data.List.sort().
I have updated the test programs in https://github.com/greg7mdp/gh c-sort to include your new version.
Here are the results (your new version looks like a definite improvement vs the current GHC one):
input GHC sort My Orig proposal gSort ------------------------------------------------------------ ---------------- --- sorted ints (ascending) 151 456 148 sorted ints (descending) 152 466 155 random ints 2732 2006 2004 random strings 6564 5549 5528
So replacing the current GHC version with gSort is a no brainer, as it is better in all regards.
However, still my version is more laziness-friendly, i.e. it requires fewer comparisons to get the N smallest elements of a list (see https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs ).
I wonder if this might not be a more useful trait than being able to sort already sorted lists super fast.
Thanks,
greg
________________________________
From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On Behalf Of Siddhanathan Shanmugam Sent: Sunday, March 26, 2017 1:05 PM To: Gregory Popovitch Cc: Haskell Libraries Subject: Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Interesting. You are right, performance for sorting random lists has priority over performance for sorting already-sorted lists.
Ignore the numbers for my previous version. Can you compare GHC's sort, your proposal, and gSort below?
gSort :: Ord a => [a] -> [a] gSort = gSortBy compare gSortBy cmp = mergeAll . sequences where sequences (a:b:xs) | a `cmp` b == GT = descending b [a] xs | otherwise = ascending b (a:) xs sequences xs = [xs]
descending a as (b:bs) | a `cmp` b == GT = descending b (a:as) bs descending a as bs = (a:as) : sequences bs
ascending a as (b:bs) | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b : mergePairs xs mergePairs xs = xs
merge as@(a:as') bs@(b:bs') | a `cmp` b == GT = b : merge as bs' | otherwise = a : merge as' bs merge [] bs = bs merge as [] = as
Thanks, Sid
On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch
wrote: Thank you @Siddhanathan! I welcome any improvement you may make, as I said I am very far from a Haskell expert.
I just tested your change with my test project (https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort ) and here are my results (mean times in ms):
input GHC sort Orig proposal your change
------------------------------------------------------------ ---------------- --- sorted ints (ascending) 153 467 139 sorted ints (descending) 152 472 599 random ints 2824 2077 2126 random strings 6564 5613 5983
Your change is a definite improvement for sorted integers in ascending order, but is worse for other cases.
Is there a real need to optimize the sort for already sorted list? Of course it should not be a degenerate case and take longer than sorting random numbers, but this is not the case here. Sorting already sorted lists is, even with my version, over 4 times faster than sorting random lists. This sounds perfectly acceptable to me, and I feel that trying to optimize this specific case further, if it comes at the detriment of the general case, is not desirable.
Thanks,
greg
________________________________
From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On Behalf Of Siddhanathan Shanmugam Sent: Sunday, March 26, 2017 11:41 AM To: Gregory Popovitch Cc: Haskell Libraries Subject: Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Thank you! This identifies a space leak in base which went unnoticed for 7 years.
Your implementation can be improved further. Instead of splitting into pairs, you could instead split into lists of sorted sublists by replacing the pairs function with the following
pair = foldr f [] where f x [] = [[x]] f x (y:ys) | x `cmp` head y == LT = (x:y):ys | otherwise = [x]:y:ys
This should give you the same performance improvements for sorting random lists, but better performance while sorting ascending lists.
The version in base takes it one step further by using a DList to handle the descending case efficiently as well, except there's a space leak right now because of which it is slower.
On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch
wrote: Motivation: ----------
Data.List.sort is a very important functionality in Haskell. I believe that the proposed implementation is:
- significantly faster than the current implementation on unsorted lists, typically 14% to 27% faster - more laziness-friendly, i.e.: take 3 $ sort l will require significantly less comparisons than the current implementation
Proposed Implementation -----------------------
sort :: (Ord a) => [a] -> [a] sort = sortBy compare
sortBy cmp [] = [] sortBy cmp xs = head $ until (null.tail) reduce (pair xs) where pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t | otherwise = [x, y] : pair t pair [x] = [[x]] pair [] = []
reduce (v:w:x:y:t) = merge v' x' : reduce t where v' = merge v w x' = merge x y
reduce (x:y:t) = merge x y : reduce t reduce xs = xs
merge xs [] = xs merge [] ys = ys merge xs@(x:xs') ys@(y:ys') | x `cmp` y == GT = y : merge xs ys' | otherwise = x : merge xs' ys
Effect and Interactions -----------------------
I have a stack project with a criterion test for this new implementation, available at https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort
<https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort > . I ran the tests on an Ubuntu 14.0.2 VM and GHC 8.0.2, and had the following results:
- sorting of random lists of integers is 27% faster - sorting of random lists of strings is 14% faster - sorting of already sorted lists is significantly slower, but still much faster than sorting random lists - proposed version is more laziness friendly. For example this version of sortBy requires 11 comparisons to find the smallest element of a 15 element list, while the default Data.List.sortBy requires 15 comparisons. (see
https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs>
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs> > )
Test results ------------
Criterion output (descending/ascending results are for already sorted lists). I barely understand what Criterion does, and I am puzzled with the various "T" output - maybe there is a bug in my bench code:
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ stack exec ghc-sort benchmarking ascending ints/ghc TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime 160.6 ms (153.4 ms .. 167.8 ms) 0.997 R² (0.986 R² .. 1.000 R²) mean 161.7 ms (158.3 ms .. 165.9 ms) std dev 5.210 ms (3.193 ms .. 7.006 ms) variance introduced by outliers: 12% (moderately inflated)
benchmarking ascending ints/greg TTTTTTTTTTTTTTTTtime 473.8 ms (398.6 ms .. 554.9 ms) 0.996 R² (0.987 R² .. 1.000 R²) mean 466.2 ms (449.0 ms .. 475.0 ms) std dev 14.94 ms (0.0 s .. 15.29 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking descending ints/ghc TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime 165.1 ms (148.2 ms .. 178.2 ms) 0.991 R² (0.957 R² .. 1.000 R²) mean 158.7 ms (154.0 ms .. 164.3 ms) std dev 7.075 ms (4.152 ms .. 9.903 ms) variance introduced by outliers: 12% (moderately inflated)
benchmarking descending ints/greg TTTTTTTTTTTTTTTTtime 471.7 ms (419.8 ms .. 508.3 ms) 0.999 R² (0.995 R² .. 1.000 R²) mean 476.0 ms (467.5 ms .. 480.0 ms) std dev 7.447 ms (67.99 as .. 7.865 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/ghc TTTTTTTTTTTTTTTTtime 2.852 s (2.564 s .. 3.019 s) 0.999 R² (0.997 R² .. 1.000 R²) mean 2.812 s (2.785 s .. 2.838 s) std dev 44.06 ms (543.9 as .. 44.97 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg TTTTTTTTTTTTTTTTtime 2.032 s (1.993 s .. 2.076 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.028 s (2.019 s .. 2.033 s) std dev 7.832 ms (0.0 s .. 8.178 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking shakespeare/ghc TTTTTTTTTTTTTTTTtime 6.504 s (6.391 s .. 6.694 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 6.499 s (6.468 s .. 6.518 s) std dev 28.85 ms (0.0 s .. 32.62 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking shakespeare/greg TTTTTTTTTTTTTTTTtime 5.560 s (5.307 s .. 5.763 s) 1.000 R² (0.999 R² .. 1.000 R²) mean 5.582 s (5.537 s .. 5.607 s) std dev 39.30 ms (0.0 s .. 43.49 ms) variance introduced by outliers: 19% (moderately inflated)
Costs and Drawbacks -------------------
The only cost I see is the reduced performance when sorting already sorted lists. However, since this remains quite efficient, indeed over 4 times faster than sorting unsorted lists, I think it is an acceptable tradeoff.
Final note ----------
My Haskell is very rusty. I worked on this a couple years ago when I was learning Haskell, and meant to propose it to the Haskell community, but never got to it at the time.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bi n/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
<http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries >
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Does this rely on Common Subexpression Elimination optimization in order to
work? Would it work more reliably if the `seq`-ed expression were
let-bound?
-- Dan Burton
On Mon, Mar 27, 2017 at 5:41 PM, David Feuer
The first seq is useless: constructor application is never suspended. I haven't had a chance to look at the rest yet.
On Mar 27, 2017 7:59 PM, "Gregory Popovitch"
wrote: Sid,
I'd be delighted to submit the patch, as long as I have permission (which I probably don't), you feel confident about the change and maybe a couple of other people agree.
Here is the proposed change. Tests shows significant speed improvement (30%) when sorting lists of random numbers, and same efficiency for sorting already sorted lists (both ascending and descending).
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 6:53 PM *To:* Gregory Popovitch *Subject:* RE: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Since I don't see any regressions, this doesn't really need CLC approval. The changes are also small enough that a Github PR may be accepted (otherwise, the change goes in via Phabricator).
Are you interested in implementing this patch? If yes, a standard Github PR should be fine. Right now gSort is a three line change I think. It will be changed in ghc/libraries/base/Data/OldList.hs on the ghc/ghc repo on Github.
I'm hoping for some more comments from other Haskellers, before pushing for this change in base. I feel like we may be missing a potential optimization that someone else might spot. So probably going to wait a few days.
On Mar 27, 2017 11:43 AM, "Gregory Popovitch"
wrote: Hi Sid,
Thanks, glad you looked into that. My understanding of the Haskell execution model is really poor, so I can't say one way or the other, but I felt that laziness ought to be considered as well, and I'm glad it was :-)
So in conclusion it looks like we have a winner with your latest gSortBy version. How do we get this pushed to the GHC library?
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 2:12 PM *To:* Gregory Popovitch
*Subject:* Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Hi Greg,
On Mon, Mar 27, 2017 at 10:19 AM, Gregory Popovitch
wrote: Unfortunately, this optimization makes the sort less lazy, so doing something like:
take 4 $ sort l
requires more sorting of the list l with this change. I'm not sure it is a good tradeoff.
This can be verified with: https://github.com/greg7mdp/gh c-sort/blob/master/src/sort_with_trace.hs
I think you're running without optimizations turned on. It is lazy in my case.
Also, the difference should be negligible (if any at all). Here's an example of the list being sorted:
[11,4,6,8,2,5,1,7,9,55,11,3] ... [[4,11],[6,8],[2,5],[1,7,9,55],[3,11],[]] ... [[1,2,4,5,6,7,8,9,11,55],[3,11]] * 1 3 * 2 3 * 4 3 * 4 11 [1,2,3,4]
The number of operations saved is only in the last merge. It's only lazy at this step.
So we save at most one traversal of the list, which is not too expensive since our worst case bounds is O(n log n) anyway.
This should mean that the asymptotic performance should be identical, regardless of the number of comparisons saved. Of course, you do get better constants, but I would be surprised if those constants translated to significantly better performance for a reasonable size list.
I do agree that it would be nice to have a more serious validation test suite.
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 12:53 PM
*To:* Gregory Popovitch *Cc:* Haskell Libraries *Subject:* Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
We can improve things a bit further by forcing evaluation (with seq) along the way appropriately.
gregSortBy cmp [] = [] gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs) where pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t | otherwise = [x, y] : pair t pair [x] = [[x]] pair [] = []
reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t where v' = merge v w `seq` merge v w x' = merge x y `seq` merge x y
reduce (x:y:t) = merge x y `seq` merge x y : reduce t reduce xs = xs
merge xs [] = xs merge [] ys = ys merge xs@(x:xs') ys@(y:ys') | x `cmp` y == GT = y : merge xs ys' | otherwise = x : merge xs' ys
gSortBy cmp = mergeAll . sequences where sequences (a:b:xs) | a `cmp` b == GT = descending b [a] xs | otherwise = ascending b (a:) xs sequences xs = [xs]
descending a as (b:bs) | a `cmp` b == GT = descending b (a:as) bs descending a as bs = (a:as) `seq` (a:as) : sequences bs
ascending a as (b:bs) | a `cmp` b /= GT = ascending b (as . (a:)) bs ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs mergePairs xs = xs
merge as@(a:as') bs@(b:bs') | a `cmp` b == GT = b : merge as bs' | otherwise = a : merge as' bs merge [] bs = bs merge as [] = as
*Before the change:*
benchmarking random ints/ghc time 3.687 s (3.541 s .. NaN s) 1.000 R² (1.000 R² .. 1.000 R²) mean 3.691 s (3.669 s .. 3.705 s) std dev 21.45 ms (0.0 s .. 24.76 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg time 2.648 s (2.482 s .. 2.822 s) 0.999 R² (0.998 R² .. 1.000 R²) mean 2.704 s (2.670 s .. 2.736 s) std dev 52.68 ms (0.0 s .. 54.49 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort time 2.733 s (2.682 s .. 2.758 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.707 s (2.689 s .. 2.718 s) std dev 16.84 ms (0.0 s .. 19.20 ms) variance introduced by outliers: 19% (moderately inflated)
*After the change:*
benchmarking random ints/greg time 2.576 s (2.548 s .. 2.628 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.590 s (2.578 s .. 2.599 s) std dev 12.99 ms (0.0 s .. 14.89 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort time 2.538 s (2.412 s .. 2.627 s) 1.000 R² (0.999 R² .. 1.000 R²) mean 2.543 s (2.517 s .. 2.560 s) std dev 26.16 ms (0.0 s .. 30.21 ms) variance introduced by outliers: 19% (moderately inflated)
On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam < siddhanathan+eml@gmail.com> wrote:
Theoretically, we could do better. We currently only exploit monotonic runs in merge sort, but we could also exploit bitonic runs:
dlist as = as [] `seq` as []
sequences [] = [[]] sequences [a] = [[a]] sequences (a:xs) = bitonic a a (a:) xs
bitonic min max as (b:bs) | b `cmp` max /= LT = bitonic min b (as . (b:)) bs | b `cmp` min /= GT = bitonic b max ((b:) . as) bs | otherwise = dlist as : sequences (b:bs) bitonic _ _ as [] = [dlist as]
The constant factors here might be too high to notice the difference though.
However, still my version is more laziness-friendly, i.e. it requires fewer comparisons to get the N smallest elements of a list (see https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_ with_trace.hs).
I wonder if this might not be a more useful trait than being able to sort already sorted lists super fast.
This comes down to a discussion of merge sort vs natural merge sort.
Data.List.sort is an implementation of a variant of merge sort called natural merge sort. The algorithm is linearithmic in the worst case, but linear in the best case (already sorted list).
On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch
wrote:
Thanks again @Siddhanathan! Looks like your gSort fixes the main issue with Data.List.sort().
I have updated the test programs in https://github.com/greg7mdp/gh c-sort to include your new version.
Here are the results (your new version looks like a definite improvement vs the current GHC one):
input GHC sort My Orig proposal gSort ------------------------------------------------------------ ---------------- --- sorted ints (ascending) 151 456 148 sorted ints (descending) 152 466 155 random ints 2732 2006 2004 random strings 6564 5549 5528
So replacing the current GHC version with gSort is a no brainer, as it is better in all regards.
However, still my version is more laziness-friendly, i.e. it requires fewer comparisons to get the N smallest elements of a list (see https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi th_trace.hs).
I wonder if this might not be a more useful trait than being able to sort already sorted lists super fast.
Thanks,
greg
________________________________
From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On Behalf Of Siddhanathan Shanmugam Sent: Sunday, March 26, 2017 1:05 PM To: Gregory Popovitch Cc: Haskell Libraries Subject: Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Interesting. You are right, performance for sorting random lists has priority over performance for sorting already-sorted lists.
Ignore the numbers for my previous version. Can you compare GHC's sort, your proposal, and gSort below?
gSort :: Ord a => [a] -> [a] gSort = gSortBy compare gSortBy cmp = mergeAll . sequences where sequences (a:b:xs) | a `cmp` b == GT = descending b [a] xs | otherwise = ascending b (a:) xs sequences xs = [xs]
descending a as (b:bs) | a `cmp` b == GT = descending b (a:as) bs descending a as bs = (a:as) : sequences bs
ascending a as (b:bs) | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b : mergePairs xs mergePairs xs = xs
merge as@(a:as') bs@(b:bs') | a `cmp` b == GT = b : merge as bs' | otherwise = a : merge as' bs merge [] bs = bs merge as [] = as
Thanks, Sid
On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch
wrote:
Thank you @Siddhanathan! I welcome any improvement you may make, as I said I am very far from a Haskell expert.
I just tested your change with my test project (https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort ) and here are my results (mean times in ms):
input GHC sort Orig proposal your change
------------------------------------------------------------ ---------------- --- sorted ints (ascending) 153 467 139 sorted ints (descending) 152 472 599 random ints 2824 2077 2126 random strings 6564 5613 5983
Your change is a definite improvement for sorted integers in ascending order, but is worse for other cases.
Is there a real need to optimize the sort for already sorted list? Of course it should not be a degenerate case and take longer than sorting random numbers, but this is not the case here. Sorting already sorted lists is, even with my version, over 4 times faster than sorting random lists. This sounds perfectly acceptable to me, and I feel that trying to optimize this specific case further, if it comes at the detriment of the general case, is not desirable.
Thanks,
greg
________________________________
From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On Behalf Of Siddhanathan Shanmugam Sent: Sunday, March 26, 2017 11:41 AM To: Gregory Popovitch Cc: Haskell Libraries Subject: Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Thank you! This identifies a space leak in base which went unnoticed for 7 years.
Your implementation can be improved further. Instead of splitting into pairs, you could instead split into lists of sorted sublists by replacing the pairs function with the following
pair = foldr f [] where f x [] = [[x]] f x (y:ys) | x `cmp` head y == LT = (x:y):ys | otherwise = [x]:y:ys
This should give you the same performance improvements for sorting random lists, but better performance while sorting ascending lists.
The version in base takes it one step further by using a DList to handle the descending case efficiently as well, except there's a space leak right now because of which it is slower.
On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch
wrote: Motivation: ----------
Data.List.sort is a very important functionality in Haskell. I believe that the proposed implementation is:
- significantly faster than the current implementation on unsorted lists, typically 14% to 27% faster - more laziness-friendly, i.e.: take 3 $ sort l will require significantly less comparisons than the current implementation
Proposed Implementation -----------------------
sort :: (Ord a) => [a] -> [a] sort = sortBy compare
sortBy cmp [] = [] sortBy cmp xs = head $ until (null.tail) reduce (pair xs) where pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t | otherwise = [x, y] : pair t pair [x] = [[x]] pair [] = []
reduce (v:w:x:y:t) = merge v' x' : reduce t where v' = merge v w x' = merge x y
reduce (x:y:t) = merge x y : reduce t reduce xs = xs
merge xs [] = xs merge [] ys = ys merge xs@(x:xs') ys@(y:ys') | x `cmp` y == GT = y : merge xs ys' | otherwise = x : merge xs' ys
Effect and Interactions -----------------------
I have a stack project with a criterion test for this new implementation, available at https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort
<https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort > . I ran the tests on an Ubuntu 14.0.2 VM and GHC 8.0.2, and had the following results:
- sorting of random lists of integers is 27% faster - sorting of random lists of strings is 14% faster - sorting of already sorted lists is significantly slower, but still much faster than sorting random lists - proposed version is more laziness friendly. For example this version of sortBy requires 11 comparisons to find the smallest element of a 15 element list, while the default Data.List.sortBy requires 15 comparisons. (see
https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi th_trace.hs <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs>
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs> > )
Test results ------------
Criterion output (descending/ascending results are for already sorted lists). I barely understand what Criterion does, and I am puzzled with the various "T" output - maybe there is a bug in my bench code:
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ stack exec ghc-sort benchmarking ascending ints/ghc TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime 160.6 ms (153.4 ms .. 167.8 ms) 0.997 R² (0.986 R² .. 1.000 R²) mean 161.7 ms (158.3 ms .. 165.9 ms) std dev 5.210 ms (3.193 ms .. 7.006 ms) variance introduced by outliers: 12% (moderately inflated)
benchmarking ascending ints/greg TTTTTTTTTTTTTTTTtime 473.8 ms (398.6 ms .. 554.9 ms) 0.996 R² (0.987 R² .. 1.000 R²) mean 466.2 ms (449.0 ms .. 475.0 ms) std dev 14.94 ms (0.0 s .. 15.29 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking descending ints/ghc TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime 165.1 ms (148.2 ms .. 178.2 ms) 0.991 R² (0.957 R² .. 1.000 R²) mean 158.7 ms (154.0 ms .. 164.3 ms) std dev 7.075 ms (4.152 ms .. 9.903 ms) variance introduced by outliers: 12% (moderately inflated)
benchmarking descending ints/greg TTTTTTTTTTTTTTTTtime 471.7 ms (419.8 ms .. 508.3 ms) 0.999 R² (0.995 R² .. 1.000 R²) mean 476.0 ms (467.5 ms .. 480.0 ms) std dev 7.447 ms (67.99 as .. 7.865 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/ghc TTTTTTTTTTTTTTTTtime 2.852 s (2.564 s .. 3.019 s) 0.999 R² (0.997 R² .. 1.000 R²) mean 2.812 s (2.785 s .. 2.838 s) std dev 44.06 ms (543.9 as .. 44.97 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg TTTTTTTTTTTTTTTTtime 2.032 s (1.993 s .. 2.076 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.028 s (2.019 s .. 2.033 s) std dev 7.832 ms (0.0 s .. 8.178 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking shakespeare/ghc TTTTTTTTTTTTTTTTtime 6.504 s (6.391 s .. 6.694 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 6.499 s (6.468 s .. 6.518 s) std dev 28.85 ms (0.0 s .. 32.62 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking shakespeare/greg TTTTTTTTTTTTTTTTtime 5.560 s (5.307 s .. 5.763 s) 1.000 R² (0.999 R² .. 1.000 R²) mean 5.582 s (5.537 s .. 5.607 s) std dev 39.30 ms (0.0 s .. 43.49 ms) variance introduced by outliers: 19% (moderately inflated)
Costs and Drawbacks -------------------
The only cost I see is the reduced performance when sorting already sorted lists. However, since this remains quite efficient, indeed over 4 times faster than sorting unsorted lists, I think it is an acceptable tradeoff.
Final note ----------
My Haskell is very rusty. I worked on this a couple years ago when I was learning Haskell, and meant to propose it to the Haskell community, but never got to it at the time.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bi n/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
<http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries >
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Turns out we don't need seq at all. A simple refactoring of the merge
function does the trick equally well.
mergePairs (a:b:xs) = merge id a b : mergePairs xs
mergePairs xs = xs
merge f as@(a:as') bs@(b:bs')
| a `cmp` b == GT = merge (f.(b:)) as bs'
| otherwise = merge (f.(a:)) as' bs
merge f [] bs = f bs
merge f as [] = f as
This variant is 10% faster in my tests.
On Mon, Mar 27, 2017 at 5:49 PM, Dan Burton
Does this rely on Common Subexpression Elimination optimization in order to work? Would it work more reliably if the `seq`-ed expression were let-bound?
I don't think it relies heavily on CSE. The seq's are there to avoid a cascading series of thunk evaluations. Using let expressions doesn't seem to affect the benchmarks.
-- Dan Burton
On Mon, Mar 27, 2017 at 5:41 PM, David Feuer
wrote: The first seq is useless: constructor application is never suspended. I haven't had a chance to look at the rest yet.
On Mar 27, 2017 7:59 PM, "Gregory Popovitch"
wrote: Sid,
I'd be delighted to submit the patch, as long as I have permission (which I probably don't), you feel confident about the change and maybe a couple of other people agree.
Here is the proposed change. Tests shows significant speed improvement (30%) when sorting lists of random numbers, and same efficiency for sorting already sorted lists (both ascending and descending).
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 6:53 PM *To:* Gregory Popovitch *Subject:* RE: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Since I don't see any regressions, this doesn't really need CLC approval. The changes are also small enough that a Github PR may be accepted (otherwise, the change goes in via Phabricator).
Are you interested in implementing this patch? If yes, a standard Github PR should be fine. Right now gSort is a three line change I think. It will be changed in ghc/libraries/base/Data/OldList.hs on the ghc/ghc repo on Github.
I'm hoping for some more comments from other Haskellers, before pushing for this change in base. I feel like we may be missing a potential optimization that someone else might spot. So probably going to wait a few days.
On Mar 27, 2017 11:43 AM, "Gregory Popovitch"
wrote: Hi Sid,
Thanks, glad you looked into that. My understanding of the Haskell execution model is really poor, so I can't say one way or the other, but I felt that laziness ought to be considered as well, and I'm glad it was :-)
So in conclusion it looks like we have a winner with your latest gSortBy version. How do we get this pushed to the GHC library?
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 2:12 PM *To:* Gregory Popovitch
*Subject:* Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Hi Greg,
On Mon, Mar 27, 2017 at 10:19 AM, Gregory Popovitch
wrote: Unfortunately, this optimization makes the sort less lazy, so doing something like:
take 4 $ sort l
requires more sorting of the list l with this change. I'm not sure it is a good tradeoff.
This can be verified with: https://github.com/greg7mdp/gh c-sort/blob/master/src/sort_with_trace.hs
I think you're running without optimizations turned on. It is lazy in my case.
Also, the difference should be negligible (if any at all). Here's an example of the list being sorted:
[11,4,6,8,2,5,1,7,9,55,11,3] ... [[4,11],[6,8],[2,5],[1,7,9,55],[3,11],[]] ... [[1,2,4,5,6,7,8,9,11,55],[3,11]] * 1 3 * 2 3 * 4 3 * 4 11 [1,2,3,4]
The number of operations saved is only in the last merge. It's only lazy at this step.
So we save at most one traversal of the list, which is not too expensive since our worst case bounds is O(n log n) anyway.
This should mean that the asymptotic performance should be identical, regardless of the number of comparisons saved. Of course, you do get better constants, but I would be surprised if those constants translated to significantly better performance for a reasonable size list.
I do agree that it would be nice to have a more serious validation test suite.
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 12:53 PM
*To:* Gregory Popovitch *Cc:* Haskell Libraries *Subject:* Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
We can improve things a bit further by forcing evaluation (with seq) along the way appropriately.
gregSortBy cmp [] = [] gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs) where pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t | otherwise = [x, y] : pair t pair [x] = [[x]] pair [] = []
reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t where v' = merge v w `seq` merge v w x' = merge x y `seq` merge x y
reduce (x:y:t) = merge x y `seq` merge x y : reduce t reduce xs = xs
merge xs [] = xs merge [] ys = ys merge xs@(x:xs') ys@(y:ys') | x `cmp` y == GT = y : merge xs ys' | otherwise = x : merge xs' ys
gSortBy cmp = mergeAll . sequences where sequences (a:b:xs) | a `cmp` b == GT = descending b [a] xs | otherwise = ascending b (a:) xs sequences xs = [xs]
descending a as (b:bs) | a `cmp` b == GT = descending b (a:as) bs descending a as bs = (a:as) `seq` (a:as) : sequences bs
ascending a as (b:bs) | a `cmp` b /= GT = ascending b (as . (a:)) bs ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs mergePairs xs = xs
merge as@(a:as') bs@(b:bs') | a `cmp` b == GT = b : merge as bs' | otherwise = a : merge as' bs merge [] bs = bs merge as [] = as
*Before the change:*
benchmarking random ints/ghc time 3.687 s (3.541 s .. NaN s) 1.000 R² (1.000 R² .. 1.000 R²) mean 3.691 s (3.669 s .. 3.705 s) std dev 21.45 ms (0.0 s .. 24.76 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg time 2.648 s (2.482 s .. 2.822 s) 0.999 R² (0.998 R² .. 1.000 R²) mean 2.704 s (2.670 s .. 2.736 s) std dev 52.68 ms (0.0 s .. 54.49 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort time 2.733 s (2.682 s .. 2.758 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.707 s (2.689 s .. 2.718 s) std dev 16.84 ms (0.0 s .. 19.20 ms) variance introduced by outliers: 19% (moderately inflated)
*After the change:*
benchmarking random ints/greg time 2.576 s (2.548 s .. 2.628 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.590 s (2.578 s .. 2.599 s) std dev 12.99 ms (0.0 s .. 14.89 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort time 2.538 s (2.412 s .. 2.627 s) 1.000 R² (0.999 R² .. 1.000 R²) mean 2.543 s (2.517 s .. 2.560 s) std dev 26.16 ms (0.0 s .. 30.21 ms) variance introduced by outliers: 19% (moderately inflated)
On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam < siddhanathan+eml@gmail.com> wrote:
Theoretically, we could do better. We currently only exploit monotonic runs in merge sort, but we could also exploit bitonic runs:
dlist as = as [] `seq` as []
sequences [] = [[]] sequences [a] = [[a]] sequences (a:xs) = bitonic a a (a:) xs
bitonic min max as (b:bs) | b `cmp` max /= LT = bitonic min b (as . (b:)) bs | b `cmp` min /= GT = bitonic b max ((b:) . as) bs | otherwise = dlist as : sequences (b:bs) bitonic _ _ as [] = [dlist as]
The constant factors here might be too high to notice the difference though.
However, still my version is more laziness-friendly, i.e. it requires fewer comparisons to get the N smallest elements of a list (see https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_ with_trace.hs).
I wonder if this might not be a more useful trait than being able to sort already sorted lists super fast.
This comes down to a discussion of merge sort vs natural merge sort.
Data.List.sort is an implementation of a variant of merge sort called natural merge sort. The algorithm is linearithmic in the worst case, but linear in the best case (already sorted list).
On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch < greg7mdp@gmail.com> wrote:
Thanks again @Siddhanathan! Looks like your gSort fixes the main issue with Data.List.sort().
I have updated the test programs in https://github.com/greg7mdp/gh c-sort to include your new version.
Here are the results (your new version looks like a definite improvement vs the current GHC one):
input GHC sort My Orig proposal gSort ------------------------------------------------------------ ---------------- --- sorted ints (ascending) 151 456 148 sorted ints (descending) 152 466 155 random ints 2732 2006 2004 random strings 6564 5549 5528
So replacing the current GHC version with gSort is a no brainer, as it is better in all regards.
However, still my version is more laziness-friendly, i.e. it requires fewer comparisons to get the N smallest elements of a list (see https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi th_trace.hs).
I wonder if this might not be a more useful trait than being able to sort already sorted lists super fast.
Thanks,
greg
________________________________
From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On Behalf Of Siddhanathan Shanmugam Sent: Sunday, March 26, 2017 1:05 PM To: Gregory Popovitch Cc: Haskell Libraries Subject: Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Interesting. You are right, performance for sorting random lists has priority over performance for sorting already-sorted lists.
Ignore the numbers for my previous version. Can you compare GHC's sort, your proposal, and gSort below?
gSort :: Ord a => [a] -> [a] gSort = gSortBy compare gSortBy cmp = mergeAll . sequences where sequences (a:b:xs) | a `cmp` b == GT = descending b [a] xs | otherwise = ascending b (a:) xs sequences xs = [xs]
descending a as (b:bs) | a `cmp` b == GT = descending b (a:as) bs descending a as bs = (a:as) : sequences bs
ascending a as (b:bs) | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b : mergePairs xs mergePairs xs = xs
merge as@(a:as') bs@(b:bs') | a `cmp` b == GT = b : merge as bs' | otherwise = a : merge as' bs merge [] bs = bs merge as [] = as
Thanks, Sid
On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch < greg7mdp@gmail.com> wrote:
Thank you @Siddhanathan! I welcome any improvement you may make, as I said I am very far from a Haskell expert.
I just tested your change with my test project (https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort ) and here are my results (mean times in ms):
input GHC sort Orig proposal your change
------------------------------------------------------------ ---------------- --- sorted ints (ascending) 153 467 139 sorted ints (descending) 152 472 599 random ints 2824 2077 2126 random strings 6564 5613 5983
Your change is a definite improvement for sorted integers in ascending order, but is worse for other cases.
Is there a real need to optimize the sort for already sorted list? Of course it should not be a degenerate case and take longer than sorting random numbers, but this is not the case here. Sorting already sorted lists is, even with my version, over 4 times faster than sorting random lists. This sounds perfectly acceptable to me, and I feel that trying to optimize this specific case further, if it comes at the detriment of the general case, is not desirable.
Thanks,
greg
________________________________
From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On Behalf Of Siddhanathan Shanmugam Sent: Sunday, March 26, 2017 11:41 AM To: Gregory Popovitch Cc: Haskell Libraries Subject: Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Thank you! This identifies a space leak in base which went unnoticed for 7 years.
Your implementation can be improved further. Instead of splitting into pairs, you could instead split into lists of sorted sublists by replacing the pairs function with the following
pair = foldr f [] where f x [] = [[x]] f x (y:ys) | x `cmp` head y == LT = (x:y):ys | otherwise = [x]:y:ys
This should give you the same performance improvements for sorting random lists, but better performance while sorting ascending lists.
The version in base takes it one step further by using a DList to handle the descending case efficiently as well, except there's a space leak right now because of which it is slower.
On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch
wrote: Motivation: ----------
Data.List.sort is a very important functionality in Haskell. I believe that the proposed implementation is:
- significantly faster than the current implementation on unsorted lists, typically 14% to 27% faster - more laziness-friendly, i.e.: take 3 $ sort l will require significantly less comparisons than the current implementation
Proposed Implementation -----------------------
sort :: (Ord a) => [a] -> [a] sort = sortBy compare
sortBy cmp [] = [] sortBy cmp xs = head $ until (null.tail) reduce (pair xs) where pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t | otherwise = [x, y] : pair t pair [x] = [[x]] pair [] = []
reduce (v:w:x:y:t) = merge v' x' : reduce t where v' = merge v w x' = merge x y
reduce (x:y:t) = merge x y : reduce t reduce xs = xs
merge xs [] = xs merge [] ys = ys merge xs@(x:xs') ys@(y:ys') | x `cmp` y == GT = y : merge xs ys' | otherwise = x : merge xs' ys
Effect and Interactions -----------------------
I have a stack project with a criterion test for this new implementation, available at https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort
<https://github.com/greg7mdp/ghc-sort https://github.com/greg7mdp/ghc-sort > . I ran the tests on an Ubuntu 14.0.2 VM and GHC 8.0.2, and had the following results:
- sorting of random lists of integers is 27% faster - sorting of random lists of strings is 14% faster - sorting of already sorted lists is significantly slower, but still much faster than sorting random lists - proposed version is more laziness friendly. For example this version of sortBy requires 11 comparisons to find the smallest element of a 15 element list, while the default Data.List.sortBy requires 15 comparisons. (see
https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi th_trace.hs <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs>
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w ith_trace.hs> > )
Test results ------------
Criterion output (descending/ascending results are for already sorted lists). I barely understand what Criterion does, and I am puzzled with the various "T" output - maybe there is a bug in my bench code:
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ stack exec ghc-sort benchmarking ascending ints/ghc TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime 160.6 ms (153.4 ms .. 167.8 ms) 0.997 R² (0.986 R² .. 1.000 R²) mean 161.7 ms (158.3 ms .. 165.9 ms) std dev 5.210 ms (3.193 ms .. 7.006 ms) variance introduced by outliers: 12% (moderately inflated)
benchmarking ascending ints/greg TTTTTTTTTTTTTTTTtime 473.8 ms (398.6 ms .. 554.9 ms) 0.996 R² (0.987 R² .. 1.000 R²) mean 466.2 ms (449.0 ms .. 475.0 ms) std dev 14.94 ms (0.0 s .. 15.29 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking descending ints/ghc TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime 165.1 ms (148.2 ms .. 178.2 ms) 0.991 R² (0.957 R² .. 1.000 R²) mean 158.7 ms (154.0 ms .. 164.3 ms) std dev 7.075 ms (4.152 ms .. 9.903 ms) variance introduced by outliers: 12% (moderately inflated)
benchmarking descending ints/greg TTTTTTTTTTTTTTTTtime 471.7 ms (419.8 ms .. 508.3 ms) 0.999 R² (0.995 R² .. 1.000 R²) mean 476.0 ms (467.5 ms .. 480.0 ms) std dev 7.447 ms (67.99 as .. 7.865 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/ghc TTTTTTTTTTTTTTTTtime 2.852 s (2.564 s .. 3.019 s) 0.999 R² (0.997 R² .. 1.000 R²) mean 2.812 s (2.785 s .. 2.838 s) std dev 44.06 ms (543.9 as .. 44.97 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg TTTTTTTTTTTTTTTTtime 2.032 s (1.993 s .. 2.076 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.028 s (2.019 s .. 2.033 s) std dev 7.832 ms (0.0 s .. 8.178 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking shakespeare/ghc TTTTTTTTTTTTTTTTtime 6.504 s (6.391 s .. 6.694 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 6.499 s (6.468 s .. 6.518 s) std dev 28.85 ms (0.0 s .. 32.62 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking shakespeare/greg TTTTTTTTTTTTTTTTtime 5.560 s (5.307 s .. 5.763 s) 1.000 R² (0.999 R² .. 1.000 R²) mean 5.582 s (5.537 s .. 5.607 s) std dev 39.30 ms (0.0 s .. 43.49 ms) variance introduced by outliers: 19% (moderately inflated)
Costs and Drawbacks -------------------
The only cost I see is the reduced performance when sorting already sorted lists. However, since this remains quite efficient, indeed over 4 times faster than sorting unsorted lists, I think it is an acceptable tradeoff.
Final note ----------
My Haskell is very rusty. I worked on this a couple years ago when I was learning Haskell, and meant to propose it to the Haskell community, but never got to it at the time.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bi n/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
<http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries >
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Sid, this new version (diff below) is really fast when sorting random ints,
but slower when sorting strings:
input GHC sort Orig proposal gSort
-------------------------------------------------------------------------
sorted ints (ascending) 151 460 147
sorted ints (descending) 151 467 171
random ints 2771 2010 1365
random strings 6542 5524 5991
Thanks,
greg
_____
From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On Behalf Of
Siddhanathan Shanmugam
Sent: Tuesday, March 28, 2017 2:45 AM
To: Dan Burton
Cc: Haskell Libraries; Gregory Popovitch
Subject: Re: fix for Data.List.sortBy
Turns out we don't need seq at all. A simple refactoring of the merge
function does the trick equally well.
mergePairs (a:b:xs) = merge id a b : mergePairs xs
mergePairs xs = xs
merge f as@(a:as') bs@(b:bs')
| a `cmp` b == GT = merge (f.(b:)) as bs'
| otherwise = merge (f.(a:)) as' bs
merge f [] bs = f bs
merge f as [] = f as
This variant is 10% faster in my tests.
On Mon, Mar 27, 2017 at 5:49 PM, Dan Burton
However, still my version is more laziness-friendly, i.e. it requires fewer
comparisons to get the N smallest elements of a list (see
https://github.com/greg7mdp/ https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs ghc-sort/blob/master/src/sort_with_trace.hs).
I wonder if this might not be a more useful trait than being able to sort
already sorted lists super fast.
This comes down to a discussion of merge sort vs natural merge sort.
Data.List.sort is an implementation of a variant of merge sort called
natural merge sort. The algorithm is linearithmic in the worst case, but
linear in the best case (already sorted list).
On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch

On Tue, Mar 28, 2017 at 5:03 AM, Gregory Popovitch
Sid, this new version (diff below) is really fast when sorting random ints, but slower when sorting strings:
Ok, let's not use that then. Also, add an inline pragma on the function merge. {-# INLINE merge #-} merge as@(a:as') bs@(b:bs')
input GHC sort Orig proposal gSort ------------------------------------------------------------------------- sorted ints (ascending) 151 460 147 sorted ints (descending) 151 467 171 random ints 2771 2010 1365 random strings 6542 5524 5991
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Tuesday, March 28, 2017 2:45 AM *To:* Dan Burton *Cc:* Haskell Libraries; Gregory Popovitch *Subject:* Re: fix for Data.List.sortBy
Turns out we don't need seq at all. A simple refactoring of the merge function does the trick equally well.
mergePairs (a:b:xs) = merge id a b : mergePairs xs mergePairs xs = xs
merge f as@(a:as') bs@(b:bs') | a `cmp` b == GT = merge (f.(b:)) as bs' | otherwise = merge (f.(a:)) as' bs merge f [] bs = f bs merge f as [] = f as
This variant is 10% faster in my tests.
On Mon, Mar 27, 2017 at 5:49 PM, Dan Burton
wrote: Does this rely on Common Subexpression Elimination optimization in order to work? Would it work more reliably if the `seq`-ed expression were let-bound?
I don't think it relies heavily on CSE. The seq's are there to avoid a cascading series of thunk evaluations. Using let expressions doesn't seem to affect the benchmarks.
-- Dan Burton
On Mon, Mar 27, 2017 at 5:41 PM, David Feuer
wrote: The first seq is useless: constructor application is never suspended. I haven't had a chance to look at the rest yet.
On Mar 27, 2017 7:59 PM, "Gregory Popovitch"
wrote: Sid,
I'd be delighted to submit the patch, as long as I have permission (which I probably don't), you feel confident about the change and maybe a couple of other people agree.
Here is the proposed change. Tests shows significant speed improvement (30%) when sorting lists of random numbers, and same efficiency for sorting already sorted lists (both ascending and descending).
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 6:53 PM *To:* Gregory Popovitch *Subject:* RE: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Since I don't see any regressions, this doesn't really need CLC approval. The changes are also small enough that a Github PR may be accepted (otherwise, the change goes in via Phabricator).
Are you interested in implementing this patch? If yes, a standard Github PR should be fine. Right now gSort is a three line change I think. It will be changed in ghc/libraries/base/Data/OldList.hs on the ghc/ghc repo on Github.
I'm hoping for some more comments from other Haskellers, before pushing for this change in base. I feel like we may be missing a potential optimization that someone else might spot. So probably going to wait a few days.
On Mar 27, 2017 11:43 AM, "Gregory Popovitch"
wrote: Hi Sid,
Thanks, glad you looked into that. My understanding of the Haskell execution model is really poor, so I can't say one way or the other, but I felt that laziness ought to be considered as well, and I'm glad it was :-)
So in conclusion it looks like we have a winner with your latest gSortBy version. How do we get this pushed to the GHC library?
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 2:12 PM *To:* Gregory Popovitch
*Subject:* Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Hi Greg,
On Mon, Mar 27, 2017 at 10:19 AM, Gregory Popovitch
wrote:
Unfortunately, this optimization makes the sort less lazy, so doing something like:
take 4 $ sort l
requires more sorting of the list l with this change. I'm not sure it is a good tradeoff.
This can be verified with: https://github.com/greg7mdp/gh c-sort/blob/master/src/sort_with_trace.hs
I think you're running without optimizations turned on. It is lazy in my case.
Also, the difference should be negligible (if any at all). Here's an example of the list being sorted:
[11,4,6,8,2,5,1,7,9,55,11,3] ... [[4,11],[6,8],[2,5],[1,7,9,55],[3,11],[]] ... [[1,2,4,5,6,7,8,9,11,55],[3,11]] * 1 3 * 2 3 * 4 3 * 4 11 [1,2,3,4]
The number of operations saved is only in the last merge. It's only lazy at this step.
So we save at most one traversal of the list, which is not too expensive since our worst case bounds is O(n log n) anyway.
This should mean that the asymptotic performance should be identical, regardless of the number of comparisons saved. Of course, you do get better constants, but I would be surprised if those constants translated to significantly better performance for a reasonable size list.
I do agree that it would be nice to have a more serious validation test suite.
Thanks,
greg
------------------------------ *From:* siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] *On Behalf Of *Siddhanathan Shanmugam *Sent:* Monday, March 27, 2017 12:53 PM
*To:* Gregory Popovitch *Cc:* Haskell Libraries *Subject:* Re: Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
We can improve things a bit further by forcing evaluation (with seq) along the way appropriately.
gregSortBy cmp [] = [] gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs) where pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t | otherwise = [x, y] : pair t pair [x] = [[x]] pair [] = []
reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t where v' = merge v w `seq` merge v w x' = merge x y `seq` merge x y
reduce (x:y:t) = merge x y `seq` merge x y : reduce t reduce xs = xs
merge xs [] = xs merge [] ys = ys merge xs@(x:xs') ys@(y:ys') | x `cmp` y == GT = y : merge xs ys' | otherwise = x : merge xs' ys
gSortBy cmp = mergeAll . sequences where sequences (a:b:xs) | a `cmp` b == GT = descending b [a] xs | otherwise = ascending b (a:) xs sequences xs = [xs]
descending a as (b:bs) | a `cmp` b == GT = descending b (a:as) bs descending a as bs = (a:as) `seq` (a:as) : sequences bs
ascending a as (b:bs) | a `cmp` b /= GT = ascending b (as . (a:)) bs ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs mergePairs xs = xs
merge as@(a:as') bs@(b:bs') | a `cmp` b == GT = b : merge as bs' | otherwise = a : merge as' bs merge [] bs = bs merge as [] = as
*Before the change:*
benchmarking random ints/ghc time 3.687 s (3.541 s .. NaN s) 1.000 R² (1.000 R² .. 1.000 R²) mean 3.691 s (3.669 s .. 3.705 s) std dev 21.45 ms (0.0 s .. 24.76 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg time 2.648 s (2.482 s .. 2.822 s) 0.999 R² (0.998 R² .. 1.000 R²) mean 2.704 s (2.670 s .. 2.736 s) std dev 52.68 ms (0.0 s .. 54.49 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort time 2.733 s (2.682 s .. 2.758 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.707 s (2.689 s .. 2.718 s) std dev 16.84 ms (0.0 s .. 19.20 ms) variance introduced by outliers: 19% (moderately inflated)
*After the change:*
benchmarking random ints/greg time 2.576 s (2.548 s .. 2.628 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.590 s (2.578 s .. 2.599 s) std dev 12.99 ms (0.0 s .. 14.89 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort time 2.538 s (2.412 s .. 2.627 s) 1.000 R² (0.999 R² .. 1.000 R²) mean 2.543 s (2.517 s .. 2.560 s) std dev 26.16 ms (0.0 s .. 30.21 ms) variance introduced by outliers: 19% (moderately inflated)
On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam < siddhanathan+eml@gmail.com> wrote:
Theoretically, we could do better. We currently only exploit monotonic runs in merge sort, but we could also exploit bitonic runs:
dlist as = as [] `seq` as []
sequences [] = [[]] sequences [a] = [[a]] sequences (a:xs) = bitonic a a (a:) xs
bitonic min max as (b:bs) | b `cmp` max /= LT = bitonic min b (as . (b:)) bs | b `cmp` min /= GT = bitonic b max ((b:) . as) bs | otherwise = dlist as : sequences (b:bs) bitonic _ _ as [] = [dlist as]
The constant factors here might be too high to notice the difference though.
> However, still my version is more laziness-friendly, i.e. it requires fewer > comparisons to get the > N smallest elements of a list (see > https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_ with_trace.hs). > > I wonder if this might not be a more useful trait than being able to sort > already sorted lists super fast.
This comes down to a discussion of merge sort vs natural merge sort.
Data.List.sort is an implementation of a variant of merge sort called natural merge sort. The algorithm is linearithmic in the worst case, but linear in the best case (already sorted list).
On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch < greg7mdp@gmail.com> wrote:
> Thanks again @Siddhanathan! Looks like your gSort fixes the main > issue with > Data.List.sort(). > > I have updated the test programs in https://github.com/greg7mdp/gh > c-sort to > include your new version. > > Here are the results (your new version looks like a definite > improvement vs > the current GHC one): > > input GHC sort My Orig proposal > gSort > ------------------------------------------------------------ > ---------------- > --- > sorted ints (ascending) 151 456 > 148 > sorted ints (descending) 152 466 > 155 > random ints 2732 2006 > 2004 > random strings 6564 5549 > 5528 > > > So replacing the current GHC version with gSort is a no brainer, as > it is > better in all regards. > > However, still my version is more laziness-friendly, i.e. it > requires fewer > comparisons to get the > N smallest elements of a list (see > https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi > th_trace.hs). > > I wonder if this might not be a more useful trait than being able to > sort > already sorted lists super fast. > > Thanks, > > greg > > ________________________________ > > From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] On > Behalf Of > Siddhanathan Shanmugam > Sent: Sunday, March 26, 2017 1:05 PM > To: Gregory Popovitch > Cc: Haskell Libraries > Subject: Re: Proposal: a new implementation for Data.List.sort and > Data.List.sortBy, which has better performance characteristics and > is more > laziness-friendly. > > > Interesting. You are right, performance for sorting random lists has > priority over performance for sorting already-sorted lists. > > Ignore the numbers for my previous version. Can you compare GHC's > sort, your > proposal, and gSort below? > > > gSort :: Ord a => [a] -> [a] > gSort = gSortBy compare > gSortBy cmp = mergeAll . sequences > where > sequences (a:b:xs) > | a `cmp` b == GT = descending b [a] xs > | otherwise = ascending b (a:) xs > sequences xs = [xs] > > > descending a as (b:bs) > | a `cmp` b == GT = descending b (a:as) bs > descending a as bs = (a:as) : sequences bs > > > ascending a as (b:bs) > | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs > ascending a as bs = as [a] `seq` as [a] : sequences bs > > > mergeAll [x] = x > mergeAll xs = mergeAll (mergePairs xs) > > > mergePairs (a:b:xs) = merge a b : mergePairs xs > mergePairs xs = xs > > > merge as@(a:as') bs@(b:bs') > | a `cmp` b == GT = b : merge as bs' > | otherwise = a : merge as' bs > merge [] bs = bs > merge as [] = as > > > Thanks, > Sid > > > On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch < > greg7mdp@gmail.com> > wrote: > > > Thank you @Siddhanathan! I welcome any improvement you may > make, as > I said I > am very far from a Haskell expert. > > I just tested your change with my test project > (https://github.com/greg7mdp/ghc-sort > https://github.com/greg7mdp/ghc-sort ) > and here are my results (mean times in ms): > > input GHC sort Orig proposal > your > change > > ------------------------------------------------------------ > ---------------- > --- > sorted ints (ascending) 153 467 > 139 > sorted ints (descending) 152 472 > 599 > random ints 2824 2077 > 2126 > random strings 6564 5613 > 5983 > > Your change is a definite improvement for sorted integers in > ascending > order, but is worse for other cases. > > Is there a real need to optimize the sort for already sorted > list? > Of course > it should not be a degenerate > case and take longer than sorting random numbers, but this > is not > the case > here. Sorting already sorted > lists is, even with my version, over 4 times faster than > sorting > random > lists. This sounds perfectly > acceptable to me, and I feel that trying to optimize this > specific > case > further, if it comes at the > detriment of the general case, is not desirable. > > Thanks, > > greg > > ________________________________ > > From: siddhanathan@gmail.com [mailto:siddhanathan@gmail.com] > On > Behalf Of > Siddhanathan Shanmugam > Sent: Sunday, March 26, 2017 11:41 AM > To: Gregory Popovitch > Cc: Haskell Libraries > Subject: Re: Proposal: a new implementation for > Data.List.sort and > Data.List.sortBy, which has better performance > characteristics and > is more > laziness-friendly. > > > > Thank you! This identifies a space leak in base which went > unnoticed > for 7 > years. > > Your implementation can be improved further. Instead of > splitting > into > pairs, you could instead split into lists of sorted sublists > by > replacing > the pairs function with the following > > pair = foldr f [] > where > f x [] = [[x]] > f x (y:ys) > | x `cmp` head y == LT = (x:y):ys > | otherwise = [x]:y:ys > > This should give you the same performance improvements for > sorting > random > lists, but better performance while sorting ascending lists. > > The version in base takes it one step further by using a > DList to > handle the > descending case efficiently as well, except there's a space > leak > right now > because of which it is slower. > > On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch >
> wrote: > > > > Motivation: > ---------- > > Data.List.sort is a very important functionality in > Haskell. > I > believe that > the proposed implementation is: > > - significantly faster than the current > implementation on > unsorted > lists, > typically 14% to 27% faster > - more laziness-friendly, i.e.: > take 3 $ sort l > will require significantly less comparisons than > the > current > implementation > > Proposed Implementation > ----------------------- > > sort :: (Ord a) => [a] -> [a] > sort = sortBy compare > > sortBy cmp [] = [] > sortBy cmp xs = head $ until (null.tail) reduce > (pair xs) > where > pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t > | otherwise = [x, y] : pair t > pair [x] = [[x]] > pair [] = [] > > reduce (v:w:x:y:t) = merge v' x' : reduce t > where v' = merge v w > x' = merge x y > > reduce (x:y:t) = merge x y : reduce t > reduce xs = xs > > merge xs [] = xs > merge [] ys = ys > merge xs@(x:xs') ys@(y:ys') > | x `cmp` y == GT = y : merge xs ys' > | otherwise = x : merge xs' ys > > > Effect and Interactions > ----------------------- > > I have a stack project with a criterion test for > this new > implementation, > available at https://github.com/greg7mdp/ghc-sort > https://github.com/greg7mdp/ghc-sort > > <https://github.com/greg7mdp/ghc-sort > https://github.com/greg7mdp/ghc-sort > . > I ran the tests on an Ubuntu 14.0.2 VM and GHC > 8.0.2, and > had the > following > results: > > - sorting of random lists of integers is 27% faster > - sorting of random lists of strings is 14% faster > - sorting of already sorted lists is significantly > slower, > but still > much > faster than sorting random lists > - proposed version is more laziness friendly. For > example > this > version of > sortBy requires 11 comparisons to find > the smallest element of a 15 element list, while > the > default > Data.List.sortBy requires 15 comparisons. > (see > > > https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi > th_trace.hs > <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w > ith_trace.hs> > > <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w > ith_trace.hs > <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w > ith_trace.hs> > > ) > > > > Test results > ------------ > > Criterion output (descending/ascending results are > for > already > sorted > lists). > I barely understand what Criterion does, and I am > puzzled > with the > various > "T" output - maybe there is a bug in my bench code: > > vagrant@vagrant-ubuntu-trusty-64:/vagrant$ stack > exec > ghc-sort > benchmarking ascending ints/ghc > TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime > 160.6 ms > (153.4 > ms .. 167.8 ms) > 0.997 R² (0.986 R² .. 1.000 > R²) > mean 161.7 ms (158.3 ms .. 165.9 > ms) > std dev 5.210 ms (3.193 ms .. 7.006 > ms) > variance introduced by outliers: 12% (moderately > inflated) > > benchmarking ascending ints/greg > TTTTTTTTTTTTTTTTtime 473.8 ms > (398.6 ms .. > 554.9 > ms) > 0.996 R² (0.987 R² .. 1.000 > R²) > mean 466.2 ms (449.0 ms .. 475.0 > ms) > std dev 14.94 ms (0.0 s .. 15.29 ms) > variance introduced by outliers: 19% (moderately > inflated) > > benchmarking descending ints/ghc > TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime > 165.1 ms > (148.2 > ms .. 178.2 ms) > 0.991 R² (0.957 R² .. 1.000 > R²) > mean 158.7 ms (154.0 ms .. 164.3 > ms) > std dev 7.075 ms (4.152 ms .. 9.903 > ms) > variance introduced by outliers: 12% (moderately > inflated) > > benchmarking descending ints/greg > TTTTTTTTTTTTTTTTtime 471.7 ms > (419.8 ms .. > 508.3 > ms) > 0.999 R² (0.995 R² .. 1.000 > R²) > mean 476.0 ms (467.5 ms .. 480.0 > ms) > std dev 7.447 ms (67.99 as .. 7.865 > ms) > variance introduced by outliers: 19% (moderately > inflated) > > benchmarking random ints/ghc > TTTTTTTTTTTTTTTTtime 2.852 s > (2.564 s .. > 3.019 s) > 0.999 R² (0.997 R² .. 1.000 > R²) > mean 2.812 s (2.785 s .. 2.838 s) > std dev 44.06 ms (543.9 as .. 44.97 > ms) > variance introduced by outliers: 19% (moderately > inflated) > > benchmarking random ints/greg > TTTTTTTTTTTTTTTTtime 2.032 s > (1.993 s .. > 2.076 s) > 1.000 R² (1.000 R² .. 1.000 > R²) > mean 2.028 s (2.019 s .. 2.033 s) > std dev 7.832 ms (0.0 s .. 8.178 ms) > variance introduced by outliers: 19% (moderately > inflated) > > benchmarking shakespeare/ghc > TTTTTTTTTTTTTTTTtime 6.504 s > (6.391 s .. > 6.694 s) > 1.000 R² (1.000 R² .. 1.000 > R²) > mean 6.499 s (6.468 s .. 6.518 s) > std dev 28.85 ms (0.0 s .. 32.62 ms) > variance introduced by outliers: 19% (moderately > inflated) > > benchmarking shakespeare/greg > TTTTTTTTTTTTTTTTtime 5.560 s > (5.307 s .. > 5.763 s) > 1.000 R² (0.999 R² .. 1.000 > R²) > mean 5.582 s (5.537 s .. 5.607 s) > std dev 39.30 ms (0.0 s .. 43.49 ms) > variance introduced by outliers: 19% (moderately > inflated) > > > Costs and Drawbacks > ------------------- > > The only cost I see is the reduced performance when > sorting > already > sorted > lists. However, since this remains quite efficient, > indeed > over 4 > times > faster than sorting unsorted lists, I think it is an > acceptable > tradeoff. > > Final note > ---------- > > My Haskell is very rusty. I worked on this a couple > years > ago when I > was > learning Haskell, and meant to propose it to the > Haskell > community, > but > never got to it at the time. > > _______________________________________________ > Libraries mailing list > Libraries@haskell.org > http://mail.haskell.org/cgi-bi > n/mailman/listinfo/libraries > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries > > <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries > > > > > > > > > > _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

OK, so current proposed changes are below: test results look good. Thanks, greg

You probably should bind the value you're forcing, even if the difference is small. On Mar 28, 2017 2:51 PM, "Siddhanathan Shanmugam" < siddhanathan+eml@gmail.com> wrote:
mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs
as well.
On Tue, Mar 28, 2017 at 11:43 AM, Gregory Popovitch
wrote: OK, so current proposed changes are below:
test results look good.
Thanks,
greg
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Tue, Mar 28, 2017 at 11:57 AM, David Feuer
You probably should bind the value you're forcing,
I'm guessing a bang pattern should do the trick: ascending a as (b:bs) | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs ascending a as bs = let !v = as [a] in v : sequences bs mergePairs (a:b:xs) = let !v = merge a b in v : mergePairs xs mergePairs xs = xs {-# INLINE merge #-} merge as@(a:as') bs@(b:bs')
even if the difference is small.
Out of curiosity, why would there be a difference at all?
On Mar 28, 2017 2:51 PM, "Siddhanathan Shanmugam" < siddhanathan+eml@gmail.com> wrote:
mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs
as well.
On Tue, Mar 28, 2017 at 11:43 AM, Gregory Popovitch
wrote: OK, so current proposed changes are below:
test results look good.
Thanks,
greg
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

OK, here is the current proposed change - tests look good: Any comments/opposition from the community? Thanks, greg

There are a number of intermediate lists that are certainly non-empty, but
their types don't reflect that. I wonder if using a non-empty list type
unpacked into the list of lists might be helpful.
On Mar 28, 2017 3:39 PM, "Gregory Popovitch"
OK, here is the current proposed change - tests look good:
Any comments/opposition from the community?
Thanks,
greg
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Tue, Mar 28, 2017 at 12:48 PM, David Feuer
There are a number of intermediate lists that are certainly non-empty, but their types don't reflect that. I wonder if using a non-empty list type unpacked into the list of lists might be helpful.
I just tried it. Seems to work fine, and the benchmark numbers are similar. Doing this in base would introduce cyclic dependencies though.
On Mar 28, 2017 3:39 PM, "Gregory Popovitch"
wrote: OK, here is the current proposed change - tests look good:
Any comments/opposition from the community?
Thanks,
greg
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

You don't need to use the usual NonEmpty type. You can copy it here. data NonEmpty a = a :< [a] data ListNELists a = Cons {-# UNPACK #-} !(NonEmpty a) (ListNELists a) | Nil Or we could unpack it manually: data ListNELists a = Cons a [a] (ListNELists a) | Nil I can't say whether these will make a difference. I'm sorry that I haven't had the time to really dig into this issue as much as I might; I've been tied up with other things. David On Tue, Mar 28, 2017 at 10:53 PM, Siddhanathan Shanmugam < siddhanathan+eml@gmail.com> wrote:
On Tue, Mar 28, 2017 at 12:48 PM, David Feuer
wrote: There are a number of intermediate lists that are certainly non-empty, but their types don't reflect that. I wonder if using a non-empty list type unpacked into the list of lists might be helpful.
I just tried it. Seems to work fine, and the benchmark numbers are similar.
Doing this in base would introduce cyclic dependencies though.
On Mar 28, 2017 3:39 PM, "Gregory Popovitch"
wrote: OK, here is the current proposed change - tests look good:
Any comments/opposition from the community?
Thanks,
greg
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
participants (4)
-
Dan Burton
-
David Feuer
-
Gregory Popovitch
-
Siddhanathan Shanmugam