Re: Proposal: Add {to,from}Strict conversion functions between strict and lazy ByteStrings

29 Oct 2011

      On Sat, 2011-10-29 at 14:18 +0200, Bas van Dijk wrote:
...
Any idea why "toStrict/simple #1" is faster than "toStrict/optimized
#1"?
tbh, I missed #1 being actually slower... Thx for pointing that out!

The reason is simply, that B.concat performs zero-copying for 0 and 1
chunks:

-- | /O(n)/ Concatenate a list of ByteStrings.
concat :: [ByteString] -> ByteString
concat []     = empty
concat [ps]   = ps
concat xs     = ...

I've added those zero-copy optimizations to the optimized toStrict2, and
now all timings are better than for the naive implementation toStrict1:

--------------------------------------------------------------------------
{-# LANGUAGE OverloadedStrings #-}

import           Criterion
import           Criterion.Main
import qualified Data.ByteString               as B
import qualified Data.ByteString.Internal      as BI
import qualified Data.ByteString.Lazy          as BL
import qualified Data.ByteString.Lazy.Internal as BLI
import           Foreign.ForeignPtr
import           Foreign.Ptr

toStrict1 :: BL.ByteString -> B.ByteString
toStrict1 = B.concat . BL.toChunks

toStrict2 :: BL.ByteString -> B.ByteString
toStrict2 BLI.Empty = B.empty
toStrict2 (BLI.Chunk c BLI.Empty) = c
toStrict2 lb = BI.unsafeCreate len $ go lb
  where
    len = BLI.foldlChunks (\l sb -> l + B.length sb) 0 lb

    go  BLI.Empty                   _   = return ()
    go (BLI.Chunk (BI.PS fp s l) r) ptr =
        withForeignPtr fp $ \p -> do
            BI.memcpy ptr (p `plusPtr` s) (fromIntegral l)
            go r (ptr `plusPtr` l)

main :: IO ()
main = do
    let lbs0 = ""
        lbs1 = "abcdefghij"
        lbs2 = BL.fromChunks (replicate 2 "abcdefghij")
        lbs3 = BL.fromChunks (replicate 10 "abcdefghij")
        lbs4 = BL.fromChunks (replicate 1000 "abcdefghij")

    print $ toStrict1 lbs0 == toStrict2 lbs0
    print $ toStrict1 lbs1 == toStrict2 lbs1
    print $ toStrict1 lbs2 == toStrict2 lbs2
    print $ toStrict1 lbs3 == toStrict2 lbs3
    print $ toStrict1 lbs4 == toStrict2 lbs4

    defaultMain
        [ bgroup "toStrict"
          [ bench "simple #0" $ whnf toStrict1 lbs0
          , bench "simple #1" $ whnf toStrict1 lbs1
          , bench "simple #2" $ whnf toStrict1 lbs2
          , bench "simple #3" $ whnf toStrict1 lbs3
          , bench "simple #4" $ whnf toStrict1 lbs4

          , bench "optimized #0" $ whnf toStrict2 lbs0
          , bench "optimized #1" $ whnf toStrict2 lbs1
          , bench "optimized #2" $ whnf toStrict2 lbs2
          , bench "optimized #3" $ whnf toStrict2 lbs3
          , bench "optimized #4" $ whnf toStrict2 lbs4
          ]
        ]

{-

True
True
True
True
True
warming up
estimating clock resolution...
mean is 2.537877 us (320001 iterations)
found 2658 outliers among 319999 samples (0.8%)
  2292 (0.7%) high severe
estimating cost of a clock call...
mean is 55.38578 ns (15 iterations)

benchmarking toStrict/simple #0
mean: 17.66316 ns, lb 17.62767 ns, ub 17.74077 ns, ci 0.950
std dev: 258.0479 ps, lb 146.2057 ps, ub 417.6999 ps, ci 0.950

benchmarking toStrict/simple #1
mean: 28.59188 ns, lb 28.49765 ns, ub 28.70322 ns, ci 0.950
std dev: 523.0005 ps, lb 415.2360 ps, ub 688.4154 ps, ci 0.950

benchmarking toStrict/simple #2
mean: 144.5600 ns, lb 144.2192 ns, ub 145.2525 ns, ci 0.950
std dev: 2.397734 ns, lb 1.352302 ns, ub 3.905039 ns, ci 0.950

benchmarking toStrict/simple #3
mean: 488.0094 ns, lb 486.6532 ns, ub 490.3121 ns, ci 0.950
std dev: 8.903734 ns, lb 6.046690 ns, ub 13.68234 ns, ci 0.950

benchmarking toStrict/simple #4
mean: 55.65404 us, lb 55.43386 us, ub 55.97341 us, ci 0.950
std dev: 1.342695 us, lb 989.9903 ns, ub 1.836054 us, ci 0.950

benchmarking toStrict/optimized #0
mean: 14.14306 ns, lb 14.10655 ns, ub 14.20415 ns, ci 0.950
std dev: 237.0362 ps, lb 159.7752 ps, ub 347.7329 ps, ci 0.950

benchmarking toStrict/optimized #1
mean: 19.10087 ns, lb 19.05273 ns, ub 19.18831 ns, ci 0.950
std dev: 322.9545 ps, lb 201.9111 ps, ub 503.2965 ps, ci 0.950

benchmarking toStrict/optimized #2
mean: 63.34285 ns, lb 63.21118 ns, ub 63.59386 ns, ci 0.950
std dev: 903.1160 ps, lb 543.0056 ps, ub 1.377267 ns, ci 0.950

benchmarking toStrict/optimized #3
mean: 166.5292 ns, lb 166.2405 ns, ub 167.1715 ns, ci 0.950
std dev: 2.144647 ns, lb 1.259903 ns, ub 3.408946 ns, ci 0.950

benchmarking toStrict/optimized #4
mean: 13.05338 us, lb 13.02102 us, ub 13.11728 us, ci 0.950
std dev: 222.8512 ns, lb 116.9145 ns, ub 343.6604 ns, ci 0.950

-}

Re: Proposal: Add {to,from}Strict conversion functions between strict and lazy ByteStrings

Herbert Valerio Riedel