
Hi again, Please ignore my last post. I just realised that the problem was in the way I was testing sort. I printed the length of the list out to make sure the spine was evaluated, but the problem is that the elements of the list are thunks which depend on each other in a left-right manner, the rightmost ones becoming very big. It is effectively the same issue as: sort $ take x [1..] that Bertram Felgenhauer identified much earlier. Apologies for the noise. Cheers, Bernie. On 16/01/2008, at 12:40 PM, Bernie Pope wrote:
Hi all,
I was looking into this issue as well, by coincidence.
I haven't figured out what is wrong, but I believe Hugs (September 2006) exhibits similar behaviour, so it is not just GHC.
I also noticed that in the third equation of merge_pairs, swapping the order of the arguments to merge also avoids the stack overflow:
Original third equation:
merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss
Swapped version:
merge_pairs cmp (xs:ys:xss) = merge cmp ys xs : merge_pairs cmp xss ^^^^^
And for reference, here is how I was testing it:
main = do args <- getArgs let size = (read $ head args) :: Int main' size
main' size = do gen <- getStdGen let rs = (randoms gen) :: [Int] let list = take size rs print $ length list print $ sort list
I'm keen to find out what is going on here.
Cheers, Bernie.
On 16/01/2008, at 2:43 AM, Simon Peyton-Jones wrote:
Weird. I see no difference in the compiled code (GHC HEAD).
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- haskell-users-bounces@haskell.org] On Behalf Of | Marcus D. Gabriel | Sent: 14 January 2008 21:02 | To: glasgow-haskell-users@haskell.org | Subject: GHC Data.List.sort performance question | | Hello, | | By a rather indirect route, I discovered that I obtain an almost | factor of two improvement in performance in Data.List.sort if I make | one small change in the implementation of the function merge which | supports mergesort and hence sortBy and sort. Admittedly, the | improvement was only noticeable to me when sorting for example one | million random Int. The current code in libraries/base/Data/ List.hs | for merge is | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp xs [] = xs | merge cmp [] ys = ys | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | and all that I did was | | \begin{code} | | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] | merge cmp [] ys = ys | merge cmp xs [] = xs | merge cmp (x:xs) (y:ys) | = case x `cmp` y of | GT -> y : merge cmp (x:xs) ys | _ -> x : merge cmp xs (y:ys) | | \end{code} | | that is, I swapped the order of the first two lines. By the way, the | second version is much less likely to overflow the stack than the | first version. | | Can some confirm this? If you confirm it, can someone explain to me | why one obtains this performance improvement? I currently just do not | grasp the point. | | Thanks, | - Marcus | | -- | Marcus D. Gabriel, Ph.D. Email: mdg@wanadoo.fr | 213 ter, rue de Mulhouse Tel: +33.3.89.69.05.06 | F68300 Saint Louis FRANCE Portable: +33.6.34.56.07.75 | | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users