GHC Data.List.sort performance question

Hello, By a rather indirect route, I discovered that I obtain an almost factor of two improvement in performance in Data.List.sort if I make one small change in the implementation of the function merge which supports mergesort and hence sortBy and sort. Admittedly, the improvement was only noticeable to me when sorting for example one million random Int. The current code in libraries/base/Data/List.hs for merge is \begin{code} merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys _ -> x : merge cmp xs (y:ys) \end{code} and all that I did was \begin{code} merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp [] ys = ys merge cmp xs [] = xs merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys _ -> x : merge cmp xs (y:ys) \end{code} that is, I swapped the order of the first two lines. By the way, the second version is much less likely to overflow the stack than the first version. Can some confirm this? If you confirm it, can someone explain to me why one obtains this performance improvement? I currently just do not grasp the point. Thanks, - Marcus -- Marcus D. Gabriel, Ph.D. Email: mdg@wanadoo.fr 213 ter, rue de Mulhouse Tel: +33.3.89.69.05.06 F68300 Saint Louis FRANCE Portable: +33.6.34.56.07.75

mdg:
Hello,
By a rather indirect route, I discovered that I obtain an almost factor of two improvement in performance in Data.List.sort if I make one small change in the implementation of the function merge which supports mergesort and hence sortBy and sort. Admittedly, the improvement was only noticeable to me when sorting for example one million random Int. The current code in libraries/base/Data/List.hs for merge is
\begin{code}
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys _ -> x : merge cmp xs (y:ys)
\end{code}
and all that I did was
\begin{code}
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp [] ys = ys merge cmp xs [] = xs merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys _ -> x : merge cmp xs (y:ys)
\end{code}
that is, I swapped the order of the first two lines. By the way, the second version is much less likely to overflow the stack than the first version.
Can some confirm this? If you confirm it, can someone explain to me why one obtains this performance improvement? I currently just do not grasp the point.
Thanks, - Marcus
Ian, you looked at sort most recently. What to check the generated code (or rerun your benchmarks?) -- Don

Weird. I see no difference in the compiled code (GHC HEAD). Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On Behalf Of | Marcus D. Gabriel | Sent: 14 January 2008 21:02 | To: glasgow-haskell-users@haskell.org | Subject: GHC Data.List.sort performance question | | Hello, | | By a rather indirect route, I discovered that I obtain an almost | factor of two improvement in performance in Data.List.sort if I make | one small change in the implementation of the function merge which | supports mergesort and hence sortBy and sort. Admittedly, the | improvement was only noticeable to me when sorting for example one | million random Int. The current code in libraries/base/Data/List.hs | for merge is | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp xs [] = xs | merge cmp [] ys = ys | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | and all that I did was | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp [] ys = ys | merge cmp xs [] = xs | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | that is, I swapped the order of the first two lines. By the way, the | second version is much less likely to overflow the stack than the | first version. | | Can some confirm this? If you confirm it, can someone explain to me | why one obtains this performance improvement? I currently just do not | grasp the point. | | Thanks, | - Marcus | | -- | Marcus D. Gabriel, Ph.D. Email: mdg@wanadoo.fr | 213 ter, rue de Mulhouse Tel: +33.3.89.69.05.06 | F68300 Saint Louis FRANCE Portable: +33.6.34.56.07.75 | | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hi all, I was looking into this issue as well, by coincidence. I haven't figured out what is wrong, but I believe Hugs (September 2006) exhibits similar behaviour, so it is not just GHC. I also noticed that in the third equation of merge_pairs, swapping the order of the arguments to merge also avoids the stack overflow: Original third equation: merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss Swapped version: merge_pairs cmp (xs:ys:xss) = merge cmp ys xs : merge_pairs cmp xss ^^^^^ And for reference, here is how I was testing it: main = do args <- getArgs let size = (read $ head args) :: Int main' size main' size = do gen <- getStdGen let rs = (randoms gen) :: [Int] let list = take size rs print $ length list print $ sort list I'm keen to find out what is going on here. Cheers, Bernie. On 16/01/2008, at 2:43 AM, Simon Peyton-Jones wrote:
Weird. I see no difference in the compiled code (GHC HEAD).
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- haskell-users-bounces@haskell.org] On Behalf Of | Marcus D. Gabriel | Sent: 14 January 2008 21:02 | To: glasgow-haskell-users@haskell.org | Subject: GHC Data.List.sort performance question | | Hello, | | By a rather indirect route, I discovered that I obtain an almost | factor of two improvement in performance in Data.List.sort if I make | one small change in the implementation of the function merge which | supports mergesort and hence sortBy and sort. Admittedly, the | improvement was only noticeable to me when sorting for example one | million random Int. The current code in libraries/base/Data/List.hs | for merge is | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp xs [] = xs | merge cmp [] ys = ys | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | and all that I did was | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp [] ys = ys | merge cmp xs [] = xs | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | that is, I swapped the order of the first two lines. By the way, the | second version is much less likely to overflow the stack than the | first version. | | Can some confirm this? If you confirm it, can someone explain to me | why one obtains this performance improvement? I currently just do not | grasp the point. | | Thanks, | - Marcus | | -- | Marcus D. Gabriel, Ph.D. Email: mdg@wanadoo.fr | 213 ter, rue de Mulhouse Tel: +33.3.89.69.05.06 | F68300 Saint Louis FRANCE Portable: +33.6.34.56.07.75 | | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hi again, Please ignore my last post. I just realised that the problem was in the way I was testing sort. I printed the length of the list out to make sure the spine was evaluated, but the problem is that the elements of the list are thunks which depend on each other in a left-right manner, the rightmost ones becoming very big. It is effectively the same issue as: sort $ take x [1..] that Bertram Felgenhauer identified much earlier. Apologies for the noise. Cheers, Bernie. On 16/01/2008, at 12:40 PM, Bernie Pope wrote:
Hi all,
I was looking into this issue as well, by coincidence.
I haven't figured out what is wrong, but I believe Hugs (September 2006) exhibits similar behaviour, so it is not just GHC.
I also noticed that in the third equation of merge_pairs, swapping the order of the arguments to merge also avoids the stack overflow:
Original third equation:
merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss
Swapped version:
merge_pairs cmp (xs:ys:xss) = merge cmp ys xs : merge_pairs cmp xss ^^^^^
And for reference, here is how I was testing it:
main = do args <- getArgs let size = (read $ head args) :: Int main' size
main' size = do gen <- getStdGen let rs = (randoms gen) :: [Int] let list = take size rs print $ length list print $ sort list
I'm keen to find out what is going on here.
Cheers, Bernie.
On 16/01/2008, at 2:43 AM, Simon Peyton-Jones wrote:
Weird. I see no difference in the compiled code (GHC HEAD).
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- haskell-users-bounces@haskell.org] On Behalf Of | Marcus D. Gabriel | Sent: 14 January 2008 21:02 | To: glasgow-haskell-users@haskell.org | Subject: GHC Data.List.sort performance question | | Hello, | | By a rather indirect route, I discovered that I obtain an almost | factor of two improvement in performance in Data.List.sort if I make | one small change in the implementation of the function merge which | supports mergesort and hence sortBy and sort. Admittedly, the | improvement was only noticeable to me when sorting for example one | million random Int. The current code in libraries/base/Data/ List.hs | for merge is | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp xs [] = xs | merge cmp [] ys = ys | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | and all that I did was | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp [] ys = ys | merge cmp xs [] = xs | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | that is, I swapped the order of the first two lines. By the way, the | second version is much less likely to overflow the stack than the | first version. | | Can some confirm this? If you confirm it, can someone explain to me | why one obtains this performance improvement? I currently just do not | grasp the point. | | Thanks, | - Marcus | | -- | Marcus D. Gabriel, Ph.D. Email: mdg@wanadoo.fr | 213 ter, rue de Mulhouse Tel: +33.3.89.69.05.06 | F68300 Saint Louis FRANCE Portable: +33.6.34.56.07.75 | | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Marcus D. Gabriel wrote:
By a rather indirect route, I discovered that I obtain an almost factor of two improvement in performance in Data.List.sort if I make one small change in the implementation of the function merge which supports mergesort and hence sortBy and sort. Admittedly, the improvement was only noticeable to me when sorting for example one million random Int. The current code in libraries/base/Data/List.hs for merge is
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys [snip]
and all that I did was
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp [] ys = ys merge cmp xs [] = xs [snip]
that is, I swapped the order of the first two lines. By the way, the second version is much less likely to overflow the stack than the first version.
Can some confirm this? If you confirm it, can someone explain to me why one obtains this performance improvement? I currently just do not grasp the point.
I'm not sure why there is a performance difference, but there is one major difference in the behaviour of these two implementations: The library version evaluates the list from right to left (i.e. it compares the last two elements of the list first), because the third argument of 'merge' is forced before the second one. Swapping those two lines of 'merge' results in a version which compares the first two elements of the list first. This means that the library version produces a stack overflow on lists generated in an iterate like fashion (say, take 1000000 [0..]). The modified version produces a stack overflow on the reverse of that list, but I believe such lists are much rarer in practice. Did you look at the GC stats for the two sort implementations? Bertram

On Jan 15, 2008 8:16 AM, Bertram Felgenhauer
Marcus D. Gabriel wrote:
By a rather indirect route, I discovered that I obtain an almost factor of two improvement in performance in Data.List.sort if I make one small change in the implementation of the function merge which supports mergesort and hence sortBy and sort. Admittedly, the improvement was only noticeable to me when sorting for example one million random Int. The current code in libraries/base/Data/List.hs for merge is
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys [snip]
and all that I did was
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp [] ys = ys merge cmp xs [] = xs [snip]
that is, I swapped the order of the first two lines. By the way, the second version is much less likely to overflow the stack than the first version.
Can some confirm this? If you confirm it, can someone explain to me why one obtains this performance improvement? I currently just do not grasp the point.
I'm not sure why there is a performance difference, but there is one major difference in the behaviour of these two implementations:
The library version evaluates the list from right to left (i.e. it compares the last two elements of the list first), because the third argument of 'merge' is forced before the second one.
Swapping those two lines of 'merge' results in a version which compares the first two elements of the list first.
This means that the library version produces a stack overflow on lists generated in an iterate like fashion (say, take 1000000 [0..]). The modified version produces a stack overflow on the reverse of that list, but I believe such lists are much rarer in practice.
Bertram, are you sure that's right? There's already a stack overflow in "take 1000000 [0..]" itself (try just "last $ take 1000000 [0..]"). See the bug http://hackage.haskell.org/trac/ghc/ticket/1997 By using [0..1000000] (which doesn't trigger that bug), I can compute both of last $ sort [0..1000000] last $ sort $ reverse [0..1000000] without any stack overflows. (Using ghc-6.8.2) Best, -Judah

Judah Jacobson wrote:
This means that the library version produces a stack overflow on lists generated in an iterate like fashion (say, take 1000000 [0..]). The modified version produces a stack overflow on the reverse of that list, but I believe such lists are much rarer in practice.
Bertram, are you sure that's right?
Yes.
There's already a stack overflow in "take 1000000 [0..]" itself (try just "last $ take 1000000 [0..]"). See the bug http://hackage.haskell.org/trac/ghc/ticket/1997
No, that's not quite right. The stack overflow happens on evaluating the last element of that list *without evaluating the previous elements first*. To wit, Prelude Data.List> foldl' (+) 0 $ take 1000000 [0..] 499999500000 Prelude Data.List> last $ take 1000000 [0..] *** Exception: stack overflow or Prelude> let t = take 1000000 [0..] Prelude> last t *** Exception: stack overflow Prelude> t !! 500000 500000 Prelude> last t 999999 Bertram

By the way, the comment about the stack was an aside for me. The enigma of the performance change is what I would like to understand. To be honest, it bothers me a little bit because I just did not expect it. Also to be frank, some one should check me since I found this out by accident later in the evening which means that you should not trust me too much. I do not believe that I am doing anything silly here; I just do not know. Bertram appears to be on to something, but the experiment below is what I observed. Only the random list for Data.List.sort overflows the default stack which makes sense given Bertram's argument but not has much too me given the sorted, reversed, and same element list below. The command sorting will run Data.List.sort on a list or sortX which is just Data.List.sort with these two aforementioned lines crossed. The random list is made from Random, the sorted list is [1..1000000], the reversed list is [1000000,999999..1], and the list of zeros is just take 1000000 $ repeat 0. % ./sorting --random sort 1000000 Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it. % ./sorting --sorted sort 1000000 Length of sorted sorted list = 1000000 % ./sorting --reversed sort 1000000 Length of sorted reversed sorted list = 1000000 % ./sorting --same sort 1000000 Length of sorted list of zeros = 1000000 % ./sorting --random sortX 1000000 Length of sorted random list = 1000000 % ./sorting --sorted sortX 1000000 Length of sorted sorted list = 1000000 % ./sorting --reversed sortX 1000000 Length of sorted reversed sorted list = 1000000 % ./sorting --same sortX 1000000 Length of sorted list of zeros = 1000000 Here is the GC for the random case but only on 100000 Int to avoid the stack overflow. I do not know what to make of this. % ./sorting --random sort 100000 +RTS -sstderr Length of sorted random list = 100000 191,878,444 bytes allocated in the heap 111,808,176 bytes copied during GC (scavenged) 20,928,592 bytes copied during GC (not scavenged) 24,830,520 bytes maximum residency (8 sample(s)) 336 collections in generation 0 ( 0.61s) 8 collections in generation 1 ( 0.24s) 58 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.67s ( 0.66s elapsed) GC time 0.85s ( 0.91s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 1.52s ( 1.58s elapsed) %GC time 55.8% (57.9% elapsed) Alloc rate 285,514,704 bytes per MUT second Productivity 44.2% of total user, 42.5% of total elapsed % ./sorting --random sortX 100000 +RTS -sstderr Length of sorted random list = 100000 175,301,948 bytes allocated in the heap 100,767,660 bytes copied during GC (scavenged) 20,974,892 bytes copied during GC (not scavenged) 13,687,992 bytes maximum residency (13 sample(s)) 337 collections in generation 0 ( 0.41s) 13 collections in generation 1 ( 0.28s) 38 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.51s ( 0.55s elapsed) GC time 0.69s ( 0.69s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 1.21s ( 1.24s elapsed) %GC time 57.3% (55.8% elapsed) Alloc rate 339,713,364 bytes per MUT second Productivity 42.4% of total user, 41.3% of total elapsed And for the record, the times on my box: # Needs to be at least 50M for the -K option % time ./sorting --random sort 1000000 +RTS -K50M Length of sorted random list = 1000000 real 0m38.16s user 0m37.34s sys 0m0.61s % time ./sorting --random sortX 1000000 Length of sorted random list = 1000000 real 0m19.36s user 0m18.94s sys 0m0.37s Cheers, - Marcus Bertram Felgenhauer wrote:
Marcus D. Gabriel wrote:
By a rather indirect route, I discovered that I obtain an almost factor of two improvement in performance in Data.List.sort if I make one small change in the implementation of the function merge which supports mergesort and hence sortBy and sort. Admittedly, the improvement was only noticeable to me when sorting for example one million random Int. The current code in libraries/base/Data/List.hs for merge is
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys
[snip]
and all that I did was
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp [] ys = ys merge cmp xs [] = xs
[snip]
that is, I swapped the order of the first two lines. By the way, the second version is much less likely to overflow the stack than the first version.
Can some confirm this? If you confirm it, can someone explain to me why one obtains this performance improvement? I currently just do not grasp the point.
I'm not sure why there is a performance difference, but there is one major difference in the behaviour of these two implementations:
The library version evaluates the list from right to left (i.e. it compares the last two elements of the list first), because the third argument of 'merge' is forced before the second one.
Swapping those two lines of 'merge' results in a version which compares the first two elements of the list first.
This means that the library version produces a stack overflow on lists generated in an iterate like fashion (say, take 1000000 [0..]). The modified version produces a stack overflow on the reverse of that list, but I believe such lists are much rarer in practice.
Did you look at the GC stats for the two sort implementations?
Bertram

Hi Marcus, On Mon, Jan 14, 2008 at 10:01:49PM +0100, Marcus D. Gabriel wrote:
code in libraries/base/Data/List.hs for merge is
merge cmp xs [] = xs merge cmp [] ys = ys
merge cmp [] ys = ys merge cmp xs [] = xs
This actually came up a while ago, in this thread: http://thread.gmane.org/gmane.comp.lang.haskell.cafe/30598 I've just applied Bertram's patch from http://www.haskell.org/pipermail/libraries/2007-November/008621.html which makes the change you suggest. Thanks Ian

Hello Ian, Thank you for the two links to the previous threads, I glanced at them and will read them more carefully later so that I can understand the point a little better. Sorry about the duplicate thread relative to the Haskell Cafe. Thanks also for applying Bertram's patch. Cheers, - Marcus Ian Lynagh wrote:
Hi Marcus,
On Mon, Jan 14, 2008 at 10:01:49PM +0100, Marcus D. Gabriel wrote:
code in libraries/base/Data/List.hs for merge is
merge cmp xs [] = xs merge cmp [] ys = ys
merge cmp [] ys = ys merge cmp xs [] = xs
This actually came up a while ago, in this thread: http://thread.gmane.org/gmane.comp.lang.haskell.cafe/30598
I've just applied Bertram's patch from http://www.haskell.org/pipermail/libraries/2007-November/008621.html which makes the change you suggest.
Thanks Ian
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Marcus D. Gabriel, Ph.D. Email: mdg@wanadoo.fr 213 ter, rue de Mulhouse Tel: +33.3.89.69.05.06 F68300 Saint Louis FRANCE Portable: +33.6.34.56.07.75
participants (7)
-
Bernie Pope
-
Bertram Felgenhauer
-
Don Stewart
-
Ian Lynagh
-
Judah Jacobson
-
Marcus D. Gabriel
-
Simon Peyton-Jones