
I've added a few more variants to your test code (see below) and I'm not sure what to think. Especially if you look at Optimal.sort4ByOffset's source code, it makes no sense that it would take so much time. So my hypothesis is that most of the time is spent on the `compare` calls. If you take a look at the generated core, sort4ByOffset is not inlined, let alone sort4ByIndex. Also, none of these two have SPECIALIZE pragmas for Doubles (or anything else for that matter). GHC is then calling the function with a reference to `compare`, which transforms a few CPU cycles into a function call that constructs a value which is then immediately deconstructed. I suggest that you add a few SPECIALIZE pragmas to vector-algorithms and check its performance again. {-# LANGUAGE Rank2Types #-} import Control.Monad import Control.Monad.ST import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Unboxed.Mutable as MV import qualified Data.Vector.Algorithms.Heap as Heap import qualified Data.Vector.Algorithms.Insertion as Insertion import qualified Data.Vector.Algorithms.Intro as Intro import qualified Data.Vector.Algorithms.Merge as Merge import qualified Data.Vector.Algorithms.Optimal as Optimal import qualified Data.Vector.Algorithms.Radix as Radix import Data.IORef import Criterion.Main type Value = Double arr = V.fromList ([1,2] :: [Value]) foo :: (forall s. V.MVector s Value -> ST s ()) -> Value -> Value foo f x = V.head q where q = runST $ do res <- V.unsafeThaw $ V.concat [ V.map (\e -> e + x) arr , V.map (\e -> e - x) arr] f res V.unsafeFreeze res main = do ref <- newIORef 0 defaultMain [ b "--nothing--" (const $ return ()) , b "--best--" best , b "Heap" Heap.sort , b "Insertion" Insertion.sort , b "Intro" Intro.sort , b "Merge" Merge.sort , b "Optimal" (\v -> Optimal.sort4ByOffset compare v 0) ] where b :: String -> (forall s. V.MVector s Value -> ST s ()) -> Benchmark b s f = bgroup s [ bench "10" $ whnf (foo f) 10 ] best :: forall s. V.MVector s Value -> ST s () best res = do -- [ 11, 12, -9, -8 ] MV.swap res 0 2 -- [ -9, 12, 11, -8 ] MV.swap res 1 3 -- [ -9, -8, 11, 12 ] -- Felipe.