Re: GHC Data.List.sort performance question

16 Jan 2008

      Hi all,

I was looking into this issue as well, by coincidence.

I haven't figured out what is wrong, but I believe Hugs (September  
2006) exhibits similar behaviour, so it is not just GHC.

I also noticed that in the third equation of merge_pairs, swapping  
the order of the arguments to merge also avoids the stack overflow:

Original third equation:

   merge_pairs cmp (xs:ys:xss)
       = merge cmp xs ys : merge_pairs cmp xss

Swapped version:

    merge_pairs cmp (xs:ys:xss)
        = merge cmp ys xs : merge_pairs cmp xss
                                ^^^^^

And for reference, here is how I was testing it:

    main = do
       args <- getArgs
       let size = (read $ head args) :: Int
       main' size

    main' size = do
       gen <- getStdGen
       let rs = (randoms gen) :: [Int]
       let list = take size rs
       print $ length list
       print $ sort list

I'm keen to find out what is going on here.

Cheers,
Bernie.

On 16/01/2008, at 2:43 AM, Simon Peyton-Jones wrote:
...
Weird.  I see no difference in the compiled code (GHC HEAD).
Simon
| -----Original Message-----
| From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- 
haskell-users-bounces@haskell.org] On Behalf Of
| Marcus D. Gabriel
| Sent: 14 January 2008 21:02
| To: glasgow-haskell-users@haskell.org
| Subject: GHC Data.List.sort performance question
|
| Hello,
|
| By a rather indirect route, I discovered that I obtain an almost
| factor of two improvement in performance in Data.List.sort if I make
| one small change in the implementation of the function merge which
| supports mergesort and hence sortBy and sort.  Admittedly, the
| improvement was only noticeable to me when sorting for example one
| million random Int.  The current code in libraries/base/Data/List.hs
| for merge is
|
| \begin{code}
|
| merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
| merge cmp xs [] = xs
| merge cmp [] ys = ys
| merge cmp (x:xs) (y:ys)
|  = case x `cmp` y of
|         GT -> y : merge cmp (x:xs)   ys
|         _  -> x : merge cmp    xs (y:ys)
|
| \end{code}
|
| and all that I did was
|
| \begin{code}
|
| merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
| merge cmp [] ys = ys
| merge cmp xs [] = xs
| merge cmp (x:xs) (y:ys)
|  = case x `cmp` y of
|         GT -> y : merge cmp (x:xs)   ys
|         _  -> x : merge cmp    xs (y:ys)
|
| \end{code}
|
| that is, I swapped the order of the first two lines.  By the way,  
the
| second version is much less likely to overflow the stack than the
| first version.
|
| Can some confirm this?  If you confirm it, can someone explain to me
| why one obtains this performance improvement?  I currently just  
do not
| grasp the point.
|
| Thanks,
| - Marcus
|
| --
|   Marcus D. Gabriel, Ph.D.                    Email:     
mdg@wanadoo.fr
|   213 ter, rue de Mulhouse                      Tel:  
+33.3.89.69.05.06
|   F68300 Saint Louis  FRANCE               Portable:  
+33.6.34.56.07.75
|
|
| _______________________________________________
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: GHC Data.List.sort performance question

Bernie Pope