
From: wren ng thornton
Sent: Mon, March 21, 2011 10:30:48 PM On 3/21/11 4:40 PM, Brandon Moore wrote: Is there an efficient way to iterate over the bytes of a ByteString?
The code I've been using (rather similar to your unsafe map) is:
import qualified Data.ByteString.Internal as BSI import qualified Foreign.ForeignPtr as FFI
foldIO :: (a -> Word8 -> IO a) -> a -> ByteString -> IO a foldIO f z0 (BSI.PS fp off len) = FFI.withForeignPtr fp $ \p0 -> do let q = p0 `plusPtr` (off+len) let go z p | z `seq` p `seq` False = undefined | p == q = return z | otherwise = do w <- peek p z' <- f z w go z' (p `plusPtr` 1) go z0 (p0 `plusPtr` off) {-# INLINE foldIO #-}
I don't need to pass an acumulating parameter. I'll see how well my code runs if I do. With a modified version of foldr, the simple definition mapM_ f x = foldr (\b rest -> f b >> rest) (return ()) is a bit faster than the specialized code from before. The changes to foldr improved performance of this definition by almost 10x for my one benchmark. Here's the strict ByteString code: {-# INLINE foldr'' #-} foldr'' :: (Word8 -> a -> a) -> a -> ByteString -> a foldr'' k v (PS x s l) = inlinePerformIO $ withForeignPtr x $ \ptr -> let { !u = s + l ; go ix | ix == u = v | otherwise = k (inlinePerformIO (do c <- peekElemOff ptr ix; touchForeignPtr x; return c)) (go (ix+1)) } in return (go s) The lazy ByteString version is built with foldrChunks foldr f v = foldrChunks (\chunk rest -> foldr'' f rest chunk) Looking for performance regressions, this seems to be around 10% slower than the current foldr in the test main = do str <- readFile "1GiBFile" print $ length $ foldr (:) [] str Surprisingly, Data.ByteString.Lazy.foldr (:) [] seems to be about twice as fast as unpack! Are there any other benchmarks I should try? Are there any other uses for a lazy ByteString foldr? The test suite from http://darcs.haskell.org/bytestring built after a bit of hacking on the Makefile, but doesn't seem to do much timing. Brandon