
So, I have written several implementations of mapM_: * bsMapM_gen — generic, works for any monad * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s) * bsMapM_IO — hand-written version specifically for IO Generic and hand-written versions don’t differ much. The overhead seems to be coming from inlinePerformIO (am I right here? Also, am I using inlinePerformIO legitimately?), which is only needed when we’re not in the IO monad. {-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m () bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp where mapp (ptr, len) = return $ go 0 where go i | i == len = return () | otherwise = let !b = inlinePerformIO $ peekByteOff ptr i in f b >> go (i+1) The short version relies on fusion of `unpack` and `mapM_`. Its advantage is that even when compiled without optimisations, it’s still fast. (Question: would the same happen to other versions, when put into Data.ByteString module? I suppose packages like bytestring are compiled with optimisations, so it probably would.) {-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_shortIO f s = mapM_ f (unpack s) Finally, the IO-specialised version. It’s faster than generic version (and, similarly, an ST-specialised version using unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma involving bsMapM_IO and bsMapM_ST should be present. (Question: are there other monads for which unsafeIOToMonad exists?) bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_IO f s = unsafeUseAsCStringLen s mapp where mapp (ptr, len) = go 0 where go i | i == len = return () | otherwise = peekByteOff ptr i >>= f >> go (i+1) A-and here’s a table comparing performance of all three functions. All timings are in milliseconds. ghci ghc ghc -O ghc -O2 +----------+----------+----------+----------+ gen | 380 | 85 | 4.1 | 4.0 | short | 45 | 46 | 17.2 | 16.5 | IO | 434 | 92 | 2.4 | 2.4 | +----------+----------+----------+----------+ Here’s the code I used. (Question: have I messed up anything?) import qualified Data.ByteString as BS import Data.Random import System.Random import System.IO.Unsafe import Control.Monad import Data.IORef import Criterion.Main import BSMaps --a bytestring consisting of 65536 random bytes testCase = BS.pack $ fst $ flip sampleState (mkStdGen 8) $ replicateM (2^16) stdUniform --sums elements of a bytestring, using given mapM_ sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) -> BS.ByteString -> Word8 sumIO f s = unsafePerformIO $ do sm <- newIORef 0 f (modifyIORef' sm . (+)) s readIORef sm --runs the tests main = defaultMain [ bench "IO" $ whnf (sumIO bsMapM_IO) testCase, bench "short" $ whnf (sumIO bsMapM_short) testCase, bench "gen" $ whnf (sumIO bsMapM_gen) testCase] Finally, if there isn’t anything wrong, what are my next steps to see this included into next version of bytestring?