
On Wed, Nov 29, 2023 at 11:49:06AM +0000, Zoran BoĆĄnjak wrote:
if I understand correctly, the ByteString.Builder is used to efficiently construct sequence of bytes from smaller parts.
Best used in continuation-passing-style (right-associatively), where all the subsequent builders are lazily added as part of constructing the "head" builder. builder = chunk1 <> (chunk2 <> (chunk3 <> (... <> chunkN)...)) Repeatedly appending tail chunks (effectively left-associate) is noticeably less efficient (similar to lists). A work-around is to instead append (Builder->Builder) endomorphisms. b1 = Endo (mappend chunk1) b2 = b1 <> Endo (mappend chunk2) b3 = b2 <> Endo (mappend chunk3) ... bN = ... and then extract the final builder via: `appEndo bN mempty`. Endomorphism append will be more efficient once there are many parts to combine.
However, for inspecting data (take, head, index...), a plain ByteString is required.
For efficient processing of network streams, you'd perhaps use a streaming API that exposes the input as a monadic stream of chunks, and perhaps a corresponding parser layered on top that supports consuming chunks monadically. The `streaming` ecosystem for example has support for this model.
What if the byte sequence manipulation task requires both, for example: - receive ByteString from the network (e.g: Network.Socket.ByteString.recv :: ... -> IO ByteString) - inspect and manipulate data (pure function) - resend to the network (e.g: Network.Socket.ByteString.sendMany :: ... -> [ByteString] -> IO ())
The input packet will be a `ByteString`, the output packet should be a builder, that is converted at the last moment to a (possibly lazy) bytestring for transmission. You shouldn't need to read your output, so a single representation is sufficient.
It is somewhat inconvenient to use 2 different types for the task, namely the ByteString and the Builder... where both represent a sequence of bytes.
A builder is not a sequence of bytes as such, it is a CPS-style generator for a slice of a future sequence of bytes that can incrementally build the entire sequence without reallocation or copying (at least when the output is a lazy bytestring).
I have tryed to define a Bytes type where both representations are available:
import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as Bsl import qualified Data.ByteString.Builder as Bld
data Bytes = Bytes { toByteString :: ByteString , toBuilder :: Builder , length :: Int }
This is not a productive direction to explore. Instead your *output* should be a Builder, either constructed lazily in one go (with the tail parts already lazily appended), or constructed by concatenation of (Builder->Builder) endomorphisms. The inputs that individual builder chunks will consume can be bytestring slices mixed with various other data (e.g. builders for binary length fields that convert ints to big-endian wire-form, ...). -- Viktor.