Re: GHC Data.List.sort performance question

16 Jan 2008

      Hi again,

Please ignore my last post.

I just realised that the problem was in the way I was testing sort. I  
printed the length of the list out to make sure the spine was  
evaluated, but the problem is that the elements of the list are  
thunks which depend on each other in a left-right manner, the  
rightmost ones becoming very big. It is effectively the same issue  
as: sort $ take x [1..] that Bertram Felgenhauer identified much  
earlier.

Apologies for the noise.

Cheers,
Bernie.

On 16/01/2008, at 12:40 PM, Bernie Pope wrote:
...
Hi all,
I was looking into this issue as well, by coincidence.
I haven't figured out what is wrong, but I believe Hugs (September  
2006) exhibits similar behaviour, so it is not just GHC.
I also noticed that in the third equation of merge_pairs, swapping  
the order of the arguments to merge also avoids the stack overflow:
Original third equation:
merge_pairs cmp (xs:ys:xss)
      = merge cmp xs ys : merge_pairs cmp xss
Swapped version:
merge_pairs cmp (xs:ys:xss)
       = merge cmp ys xs : merge_pairs cmp xss
                               ^^^^^
And for reference, here is how I was testing it:
main = do
      args <- getArgs
      let size = (read $ head args) :: Int
      main' size
main' size = do
      gen <- getStdGen
      let rs = (randoms gen) :: [Int]
      let list = take size rs
      print $ length list
      print $ sort list
I'm keen to find out what is going on here.
Cheers,
Bernie.
On 16/01/2008, at 2:43 AM, Simon Peyton-Jones wrote:
...
Weird.  I see no difference in the compiled code (GHC HEAD).
Simon
| -----Original Message-----
| From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- 
haskell-users-bounces@haskell.org] On Behalf Of
| Marcus D. Gabriel
| Sent: 14 January 2008 21:02
| To: glasgow-haskell-users@haskell.org
| Subject: GHC Data.List.sort performance question
|
| Hello,
|
| By a rather indirect route, I discovered that I obtain an almost
| factor of two improvement in performance in Data.List.sort if I  
make
| one small change in the implementation of the function merge which
| supports mergesort and hence sortBy and sort.  Admittedly, the
| improvement was only noticeable to me when sorting for example one
| million random Int.  The current code in libraries/base/Data/ 
List.hs
| for merge is
|
| \begin{code}
|
| merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
| merge cmp xs [] = xs
| merge cmp [] ys = ys
| merge cmp (x:xs) (y:ys)
|  = case x `cmp` y of
|         GT -> y : merge cmp (x:xs)   ys
|         _  -> x : merge cmp    xs (y:ys)
|
| \end{code}
|
| and all that I did was
|
| \begin{code}
|
| merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
| merge cmp [] ys = ys
| merge cmp xs [] = xs
| merge cmp (x:xs) (y:ys)
|  = case x `cmp` y of
|         GT -> y : merge cmp (x:xs)   ys
|         _  -> x : merge cmp    xs (y:ys)
|
| \end{code}
|
| that is, I swapped the order of the first two lines.  By the  
way, the
| second version is much less likely to overflow the stack than the
| first version.
|
| Can some confirm this?  If you confirm it, can someone explain  
to me
| why one obtains this performance improvement?  I currently just  
do not
| grasp the point.
|
| Thanks,
| - Marcus
|
| --
|   Marcus D. Gabriel, Ph.D.                    Email:     
mdg@wanadoo.fr
|   213 ter, rue de Mulhouse                      Tel:  
+33.3.89.69.05.06
|   F68300 Saint Louis  FRANCE               Portable:  
+33.6.34.56.07.75
|
|
| _______________________________________________
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users