
Hi all, I was looking into this issue as well, by coincidence. I haven't figured out what is wrong, but I believe Hugs (September 2006) exhibits similar behaviour, so it is not just GHC. I also noticed that in the third equation of merge_pairs, swapping the order of the arguments to merge also avoids the stack overflow: Original third equation: merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss Swapped version: merge_pairs cmp (xs:ys:xss) = merge cmp ys xs : merge_pairs cmp xss ^^^^^ And for reference, here is how I was testing it: main = do args <- getArgs let size = (read $ head args) :: Int main' size main' size = do gen <- getStdGen let rs = (randoms gen) :: [Int] let list = take size rs print $ length list print $ sort list I'm keen to find out what is going on here. Cheers, Bernie. On 16/01/2008, at 2:43 AM, Simon Peyton-Jones wrote:
Weird. I see no difference in the compiled code (GHC HEAD).
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- haskell-users-bounces@haskell.org] On Behalf Of | Marcus D. Gabriel | Sent: 14 January 2008 21:02 | To: glasgow-haskell-users@haskell.org | Subject: GHC Data.List.sort performance question | | Hello, | | By a rather indirect route, I discovered that I obtain an almost | factor of two improvement in performance in Data.List.sort if I make | one small change in the implementation of the function merge which | supports mergesort and hence sortBy and sort. Admittedly, the | improvement was only noticeable to me when sorting for example one | million random Int. The current code in libraries/base/Data/List.hs | for merge is | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp xs [] = xs | merge cmp [] ys = ys | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | and all that I did was | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp [] ys = ys | merge cmp xs [] = xs | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | that is, I swapped the order of the first two lines. By the way, the | second version is much less likely to overflow the stack than the | first version. | | Can some confirm this? If you confirm it, can someone explain to me | why one obtains this performance improvement? I currently just do not | grasp the point. | | Thanks, | - Marcus | | -- | Marcus D. Gabriel, Ph.D. Email: mdg@wanadoo.fr | 213 ter, rue de Mulhouse Tel: +33.3.89.69.05.06 | F68300 Saint Louis FRANCE Portable: +33.6.34.56.07.75 | | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users