Why is vector sort slower than list sort?

From places like http://programmers.stackexchange.com/questions/180417/fastest-haskell-librar... and http://stackoverflow.com/questions/3655329/how-does-one-sort-with-data-vecto... I'm led to believe that sorting mutable vectors is faster than sorting lists. However, when running on my own computer, I'm seeing the opposite effect. What's wrong with what I'm doing, or am I misunderstanding something? What's the fastest way to sort?
Here's the code: Sort0.hs (List version) import Control.Monad import System.Random import qualified Data.Vector as IV import qualified Data.Vector.Mutable as MV import qualified Data.Vector.Generic as V import qualified Data.Vector.Algorithms.Intro as VA import qualified Data.List as L randList :: IO [Int] randList = sequence $ replicate 1000000 (randomIO :: IO Int) main = do v <- randList print $ L.sort v Here's Sort1.hs import Control.Monad import System.Random import qualified Data.Vector as IV import qualified Data.Vector.Mutable as MV import qualified Data.Vector.Generic as V import qualified Data.Vector.Algorithms.Intro as VA randVector :: IO (MV.IOVector Int) randVector = do v <- MV.new 1000000 forM [0..999999] $ \x -> do r <- randomIO :: IO Int MV.write v x r return v main = do v <- randVector VA.sort v iv <- V.unsafeFreeze v :: IO (IV.Vector Int) print . V.toList $ iv I'm compiling and running as follows: ghc -prof -fprof-auto -rtsopts -o Sort0 Sort0.hs ./Sort0 +RTS -p -sstderr -K99999999 -RTS Here are the traces; the first one is Sort0.hs (list version) and takes ~11s while the other takes ~17s: Sat May 30 05:06 2015 Time and Allocation Profiling Report (Final) Sort0.exe +RTS -p -sstderr -K99999999 -RTS total time = *11.43 secs* (11434 ticks @ 1000 us, 1 processor) total alloc = 1,966,669,596 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc main Main 61.4 49.4 randList Main 38.6 50.6 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 41 0 0.0 0.0 100.0 100.0 CAF GHC.IO.Encoding.CodePage 69 0 0.0 0.0 0.0 0.0 CAF GHC.IO.Encoding 67 0 0.0 0.0 0.0 0.0 CAF System.CPUTime 59 0 0.0 0.0 0.0 0.0 CAF GHC.IO.Handle.FD 58 0 0.0 0.0 0.0 0.0 CAF Data.Fixed 54 0 0.0 0.0 0.0 0.0 CAF Data.Time.Clock.POSIX 50 0 0.0 0.0 0.0 0.0 CAF System.Random 49 0 0.0 0.0 0.0 0.0 CAF Main 48 0 0.0 0.0 100.0 100.0 randList Main 83 1 1.7 3.1 1.7 3.1 main Main 82 1 61.4 49.4 98.3 96.9 randList Main 84 0 36.9 47.6 36.9 47.6 Sat May 30 04:59 2015 Time and Allocation Profiling Report (Final) Sort1.exe +RTS -p -sstderr -K99999999 -RTS total time = * 17.07 secs* (17066 ticks @ 1000 us, 1 processor) total alloc = 5,783,472,528 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc main Main 71.9 81.5 randVector.\ Main 25.0 16.5 randVector Main 3.1 2.0 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 60 0 0.0 0.0 100.0 100.0 CAF GHC.IO.Encoding.CodePage 108 0 0.0 0.0 0.0 0.0 CAF GHC.IO.Encoding 106 0 0.0 0.0 0.0 0.0 CAF System.CPUTime 98 0 0.0 0.0 0.0 0.0 CAF GHC.IO.Handle.FD 95 0 0.0 0.0 0.0 0.0 CAF Data.Fixed 87 0 0.0 0.0 0.0 0.0 CAF Data.Time.Clock.POSIX 82 0 0.0 0.0 0.0 0.0 CAF System.Random 81 0 0.0 0.0 0.0 0.0 CAF Main 67 0 0.0 0.0 100.0 100.0 randVector Main 121 1 0.0 0.0 0.0 0.0 main Main 120 1 71.9 81.5 100.0 100.0 randVector Main 122 0 3.1 2.0 28.1 18.5 randVector.\ Main 123 1000000 25.0 16.5 25.0 16.5 Thanks, Holden

I'm led to believe that sorting mutable vectors is faster than sorting lists.
For one thing, you should use "-O2" when looking at performance. For another, you're not actually looking at the sort in isolation. If you pull it out to the top-level (e.g. "doSort = L.sort") so you can see it in the profiling, then even without -O2 I see the vector sort is faster. (Arguably you should include the cost of freezing the vector too.) Regards, Toby.
participants (2)
-
Holden Lee
-
Toby Goodwin