Re: [Haskell-cafe] ByteString and ByteString.Builder questions

29 Nov 2023

      On Wed, Nov 29, 2023 at 11:49:06AM +0000, Zoran Bošnjak wrote:
...
if I understand correctly, the ByteString.Builder is used to
efficiently construct sequence of bytes from smaller parts.
Best used in continuation-passing-style (right-associatively), where all
the subsequent builders are lazily added as part of constructing the
"head" builder. 

    builder = chunk1 <> (chunk2 <> (chunk3 <> (... <> chunkN)...))

Repeatedly appending tail chunks (effectively left-associate) is
noticeably less efficient (similar to lists).  A work-around is to
instead append (Builder->Builder) endomorphisms.

    b1 = Endo (mappend chunk1)
    b2 = b1 <> Endo (mappend chunk2)
    b3 = b2 <> Endo (mappend chunk3)
    ...
    bN = ...

and then extract the final builder via: `appEndo bN mempty`.
Endomorphism append will be more efficient once there are many parts to
combine.
...
However, for inspecting data (take, head, index...), a plain
ByteString is required.
For efficient processing of network streams, you'd perhaps use a
streaming API that exposes the input as a monadic stream of chunks,
and perhaps a corresponding parser layered on top that supports
consuming chunks monadically.  The `streaming` ecosystem for
example has support for this model.
...
What if the byte sequence manipulation task requires both, for example:
- receive ByteString from the network (e.g: Network.Socket.ByteString.recv :: ... -> IO ByteString)
- inspect and manipulate data (pure function)
- resend to the network (e.g: Network.Socket.ByteString.sendMany :: ... -> [ByteString] -> IO ())
The input packet will be a `ByteString`, the output packet should be a
builder, that is converted at the last moment to a (possibly lazy)
bytestring for transmission.  You shouldn't need to read your
output, so a single representation is sufficient.
...
It is somewhat inconvenient to use 2 different types for the task,
namely the ByteString and the Builder... where both represent a
sequence of bytes.
A builder is not a sequence of bytes as such, it is a CPS-style
generator for a slice of a future sequence of bytes that can
incrementally build the entire sequence without reallocation
or copying (at least when the output is a lazy bytestring).
...
I have tryed to define a Bytes type where both representations are available:
import qualified Data.ByteString as BS                                                                                
import qualified Data.ByteString.Lazy as Bsl                                                                          
import qualified Data.ByteString.Builder as Bld
data Bytes = Bytes
    { toByteString :: ByteString
    , toBuilder    :: Builder
    , length       :: Int
    }
This is not a productive direction to explore.  Instead your *output*
should be a Builder, either constructed lazily in one go (with the tail
parts already lazily appended), or constructed by concatenation of
(Builder->Builder) endomorphisms.  The inputs that individual builder
chunks will consume can be bytestring slices mixed with various other
data (e.g. builders for binary length fields that convert ints to
big-endian wire-form, ...).

-- 
    Viktor.

Re: [Haskell-cafe] ByteString and ByteString.Builder questions

Viktor Dukhovni