GHC Data.List.sort performance question

14 Jan 2008

      Hello,

By a rather indirect route, I discovered that I obtain an almost
factor of two improvement in performance in Data.List.sort if I make
one small change in the implementation of the function merge which
supports mergesort and hence sortBy and sort.  Admittedly, the
improvement was only noticeable to me when sorting for example one
million random Int.  The current code in libraries/base/Data/List.hs
for merge is

\begin{code}

merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp xs [] = xs
merge cmp [] ys = ys
merge cmp (x:xs) (y:ys)
 = case x `cmp` y of
        GT -> y : merge cmp (x:xs)   ys
        _  -> x : merge cmp    xs (y:ys)

\end{code}

and all that I did was

\begin{code}

merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp [] ys = ys
merge cmp xs [] = xs
merge cmp (x:xs) (y:ys)
 = case x `cmp` y of
        GT -> y : merge cmp (x:xs)   ys
        _  -> x : merge cmp    xs (y:ys)

\end{code}

that is, I swapped the order of the first two lines.  By the way, the
second version is much less likely to overflow the stack than the
first version.

Can some confirm this?  If you confirm it, can someone explain to me
why one obtains this performance improvement?  I currently just do not
grasp the point.

Thanks,
- Marcus

-- 
  Marcus D. Gabriel, Ph.D.                    Email:    mdg@wanadoo.fr
  213 ter, rue de Mulhouse                      Tel: +33.3.89.69.05.06
  F68300 Saint Louis  FRANCE               Portable: +33.6.34.56.07.75

Marcus D. Gabriel

Don Stewart

Simon Peyton-Jones

Bernie Pope

Bernie Pope

Bertram Felgenhauer

Judah Jacobson

Bertram Felgenhauer

Marcus D. Gabriel

Ian Lynagh

Marcus D. Gabriel

tags

participants (7)