
Hi all, if I understand correctly, the ByteString.Builder is used to efficiently construct sequence of bytes from smaller parts. However, for inspecting data (take, head, index...), a plain ByteString is required. What if the byte sequence manipulation task requires both, for example: - receive ByteString from the network (e.g: Network.Socket.ByteString.recv :: ... -> IO ByteString) - inspect and manipulate data (pure function) - resend to the network (e.g: Network.Socket.ByteString.sendMany :: ... -> [ByteString] -> IO ()) ... where a pure data manipulation could be something like: - extract some segments out of the original sequence - create and add some segments of bytes - reorder - concatinate It is somewhat inconvenient to use 2 different types for the task, namely the ByteString and the Builder... where both represent a sequence of bytes. I have tryed to define a Bytes type where both representations are available: import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as Bsl import qualified Data.ByteString.Builder as Bld data Bytes = Bytes { toByteString :: ByteString , toBuilder :: Builder , length :: Int } instance Semigroup Bytes ... instance Monoid Bytes ... -- create fromByteString :: ByteString -> Bytes fromByteString bs = Bytes { toByteString = bs , toBuilder = Bld.byteString bs , length = BS.length bs } -- inspect function example head :: Bytes -> Word8 head = BS.head . toByteString -- prepare to send data over the network toChunks :: Bytes -> [ByteString] toChunks = Bsl.toChunks . Bld.toLazyByteString . toBuilder The fields of Bytes are non-strict, so the expectation was that lazy evaluation will suspend unnecessary calculations and to have efficient inspection (via ByteString part) and efficient concatination (via Builder part). I have performed some benchmarks (not sure if they are exactly to the point), but the results of Bytes are not so good: Inspect test using ByteString: OK (0.50s) 5.51 ms ± 342 μs using Bytes: OK (0.17s) 62.0 ms ± 3.0 ms Construct test using Builder: OK (0.21s) 5.29 ms ± 361 μs using Bytes: OK (0.17s) 21.3 ms ± 1.6 ms naive: OK (0.81s) 49.6 ms ± 4.1 ms Here is the full code: https://gist.github.com/zoranbosnjak/7887d843056f07bac6061d20970e1d6a My questions are: 0) Where does the big timing difference comming from? Thunks? 1) Is there any simple way (INLINE pragmas... or some other tricks) to get the performance back with the current implementation? 2) Would DList [ByteString] or any other type be any better over the ByteString.Builder in this case? 3) What would be the most efficient (and reasonable) implementation to support this kind of data processing (inspecting and concatination) in some uniform way? In other words: Is it worth to have uniformity here? 4) Is the Network.Socket.ByteString.sendMany function the way to go in cases where the byte sequence is constructed from segment or is there any better (faster) way? regards, Zoran