
Guillaume Bouchard wrote:
- PRef (Unboxed bytearray of size 1, the closest thing to an unboxed stack allocation) : 86ms - URef (Unboxed vector of size 1) : 130ms - SRef (Storable vector of size 1) : 43ms (This is an improvement !) - BRef (Boxed ref) : 137ms - STRef : 54ms (this was my baseline)
You really shouldn't use any mutable variable at all for this, but pass the values around as function arguments instead: count_inv' :: V2.MVector s Int32 -> V2.MVector s Int32 -> ST s Int count_inv' a buf | V.length a <= 1 = return 0 | otherwise = do let len = V.length a mid = len `div` 2 counta <- count_inv' (V.slice 0 mid a) buf countb <- count_inv' (V.slice mid (len - mid) a) buf V.unsafeCopy (V.slice 0 mid buf) (V.slice 0 mid a) let go idx1 idx2 count i = if i == len then return count else do cond <- return (idx1 < mid) .&&. (return (idx2 == len) .||. (V.unsafeRead buf idx1 .<=. V.unsafeRead a idx2)) if cond then do V.unsafeRead buf idx1 >>= V.unsafeWrite a i go (idx1 + 1) idx2 (count + idx2 - mid) (i+1) else do V.unsafeRead a idx2 >>= V.unsafeWrite a i go idx1 (idx2 + 1) count (i+1) go 0 mid (counta + countb) (0 :: Int) Besides, the code spends most of its time on parsing the input. The following more low-level code does the job far more quickly: import Data.ByteString.Char8 as B parse' :: IO [Int32] parse' = do content <- B.getContents return $ map fromIntegral $ unfoldr (B.readInt . B.dropWhile (=='\n')) $ content It's possible to improve this slightly by implementing the code from scratch: parse'' :: IO [Int32] parse'' = do content <- B.getContents return $ go content where go b = case B.uncons b of Nothing -> [] Just ('\n',b) -> go b Just ('-',b) -> go'' 0 b Just (d,b) -> go' (fromIntegral (ord d - 48)) b go' v b = case B.uncons b of Nothing -> [v] Just ('\n',b) -> v : go b Just (d,b) -> go' (v*10 + fromIntegral (ord d - 48)) b go'' v b = case B.uncons b of Nothing -> [v] Just ('\n',b) -> v : go b Just (d,b) -> go' (v*10 - fromIntegral (ord d - 48)) b Taken together these changes improve the runtime from 79ms to 21ms here. Cheers, Bertram