ByteString and ByteString.Builder questions

Hi all, if I understand correctly, the ByteString.Builder is used to efficiently construct sequence of bytes from smaller parts. However, for inspecting data (take, head, index...), a plain ByteString is required. What if the byte sequence manipulation task requires both, for example: - receive ByteString from the network (e.g: Network.Socket.ByteString.recv :: ... -> IO ByteString) - inspect and manipulate data (pure function) - resend to the network (e.g: Network.Socket.ByteString.sendMany :: ... -> [ByteString] -> IO ()) ... where a pure data manipulation could be something like: - extract some segments out of the original sequence - create and add some segments of bytes - reorder - concatinate It is somewhat inconvenient to use 2 different types for the task, namely the ByteString and the Builder... where both represent a sequence of bytes. I have tryed to define a Bytes type where both representations are available: import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as Bsl import qualified Data.ByteString.Builder as Bld data Bytes = Bytes { toByteString :: ByteString , toBuilder :: Builder , length :: Int } instance Semigroup Bytes ... instance Monoid Bytes ... -- create fromByteString :: ByteString -> Bytes fromByteString bs = Bytes { toByteString = bs , toBuilder = Bld.byteString bs , length = BS.length bs } -- inspect function example head :: Bytes -> Word8 head = BS.head . toByteString -- prepare to send data over the network toChunks :: Bytes -> [ByteString] toChunks = Bsl.toChunks . Bld.toLazyByteString . toBuilder The fields of Bytes are non-strict, so the expectation was that lazy evaluation will suspend unnecessary calculations and to have efficient inspection (via ByteString part) and efficient concatination (via Builder part). I have performed some benchmarks (not sure if they are exactly to the point), but the results of Bytes are not so good: Inspect test using ByteString: OK (0.50s) 5.51 ms ± 342 μs using Bytes: OK (0.17s) 62.0 ms ± 3.0 ms Construct test using Builder: OK (0.21s) 5.29 ms ± 361 μs using Bytes: OK (0.17s) 21.3 ms ± 1.6 ms naive: OK (0.81s) 49.6 ms ± 4.1 ms Here is the full code: https://gist.github.com/zoranbosnjak/7887d843056f07bac6061d20970e1d6a My questions are: 0) Where does the big timing difference comming from? Thunks? 1) Is there any simple way (INLINE pragmas... or some other tricks) to get the performance back with the current implementation? 2) Would DList [ByteString] or any other type be any better over the ByteString.Builder in this case? 3) What would be the most efficient (and reasonable) implementation to support this kind of data processing (inspecting and concatination) in some uniform way? In other words: Is it worth to have uniformity here? 4) Is the Network.Socket.ByteString.sendMany function the way to go in cases where the byte sequence is constructed from segment or is there any better (faster) way? regards, Zoran

On Wed, Nov 29, 2023 at 11:49:06AM +0000, Zoran Bošnjak wrote:
if I understand correctly, the ByteString.Builder is used to efficiently construct sequence of bytes from smaller parts.
Best used in continuation-passing-style (right-associatively), where all the subsequent builders are lazily added as part of constructing the "head" builder. builder = chunk1 <> (chunk2 <> (chunk3 <> (... <> chunkN)...)) Repeatedly appending tail chunks (effectively left-associate) is noticeably less efficient (similar to lists). A work-around is to instead append (Builder->Builder) endomorphisms. b1 = Endo (mappend chunk1) b2 = b1 <> Endo (mappend chunk2) b3 = b2 <> Endo (mappend chunk3) ... bN = ... and then extract the final builder via: `appEndo bN mempty`. Endomorphism append will be more efficient once there are many parts to combine.
However, for inspecting data (take, head, index...), a plain ByteString is required.
For efficient processing of network streams, you'd perhaps use a streaming API that exposes the input as a monadic stream of chunks, and perhaps a corresponding parser layered on top that supports consuming chunks monadically. The `streaming` ecosystem for example has support for this model.
What if the byte sequence manipulation task requires both, for example: - receive ByteString from the network (e.g: Network.Socket.ByteString.recv :: ... -> IO ByteString) - inspect and manipulate data (pure function) - resend to the network (e.g: Network.Socket.ByteString.sendMany :: ... -> [ByteString] -> IO ())
The input packet will be a `ByteString`, the output packet should be a builder, that is converted at the last moment to a (possibly lazy) bytestring for transmission. You shouldn't need to read your output, so a single representation is sufficient.
It is somewhat inconvenient to use 2 different types for the task, namely the ByteString and the Builder... where both represent a sequence of bytes.
A builder is not a sequence of bytes as such, it is a CPS-style generator for a slice of a future sequence of bytes that can incrementally build the entire sequence without reallocation or copying (at least when the output is a lazy bytestring).
I have tryed to define a Bytes type where both representations are available:
import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as Bsl import qualified Data.ByteString.Builder as Bld
data Bytes = Bytes { toByteString :: ByteString , toBuilder :: Builder , length :: Int }
This is not a productive direction to explore. Instead your *output* should be a Builder, either constructed lazily in one go (with the tail parts already lazily appended), or constructed by concatenation of (Builder->Builder) endomorphisms. The inputs that individual builder chunks will consume can be bytestring slices mixed with various other data (e.g. builders for binary length fields that convert ints to big-endian wire-form, ...). -- Viktor.

On Wed, Nov 29, 2023 at 12:24:17PM -0500, Viktor Dukhovni wrote:
if I understand correctly, the ByteString.Builder is used to efficiently construct sequence of bytes from smaller parts.
Best used in continuation-passing-style (right-associatively), where all the subsequent builders are lazily added as part of constructing the "head" builder.
builder = chunk1 <> (chunk2 <> (chunk3 <> (... <> chunkN)...))
Repeatedly appending tail chunks (effectively left-associate) is noticeably less efficient (similar to lists). A work-around is to instead append (Builder->Builder) endomorphisms.
b1 = Endo (mappend chunk1) b2 = b1 <> Endo (mappend chunk2) b3 = b2 <> Endo (mappend chunk3) ... bN = ...
Actually, this part is was a myth I failed to check, looking at the `Builder` code, I see that builder `mappend` is already endomorphism composition, so the extra wrapping is redundant. So use builders for output, and don't attempt to model some hybrid of builders and bytestrings. It is rather unclear what problem that is actually intended to solve. -- Viktor.

Thanks for your answer. That makes sense. In particular the fact that there should be no need to read own output.
However, one problem remains. Data format I am working with contains a total-length prefix. That is:
2 bytes of total length somewhere in the header of the message (header is of the fixed length). But the builder does not directly provide length.
So, how do I suppose to effectively encode the message in the form:
msg = header <> chunk1 <> chunk2 <> ...
Should I use (Builder, Sum Int) instead of plain Builder when composing?
Or
data MyBuilder = MyBuilder Builder Int
... and create a Monoid instance?
Or any other approach?
Zoran
----- Original Message -----
From: "Viktor Dukhovni"
if I understand correctly, the ByteString.Builder is used to efficiently construct sequence of bytes from smaller parts.
Best used in continuation-passing-style (right-associatively), where all the subsequent builders are lazily added as part of constructing the "head" builder.
builder = chunk1 <> (chunk2 <> (chunk3 <> (... <> chunkN)...))
Repeatedly appending tail chunks (effectively left-associate) is noticeably less efficient (similar to lists). A work-around is to instead append (Builder->Builder) endomorphisms.
b1 = Endo (mappend chunk1) b2 = b1 <> Endo (mappend chunk2) b3 = b2 <> Endo (mappend chunk3) ... bN = ...
Actually, this part is was a myth I failed to check, looking at the `Builder` code, I see that builder `mappend` is already endomorphism composition, so the extra wrapping is redundant. So use builders for output, and don't attempt to model some hybrid of builders and bytestrings. It is rather unclear what problem that is actually intended to solve. -- Viktor. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Thu, Nov 30, 2023 at 06:58:32AM +0000, Zoran Bošnjak wrote:
Thanks for your answer. That makes sense. In particular the fact that there should be no need to read own output.
However, one problem remains. Data format I am working with contains a total-length prefix. That is: 2 bytes of total length somewhere in the header of the message (header is of the fixed length). But the builder does not directly provide length.
I knew you'd eventually ask this question.
So, how do I suppose to effectively encode the message in the form:
msg = header <> chunk1 <> chunk2 <> ...
Take a look at my DNS Stub resolver codec: https://github.com/dnsbase/dnsbase/blob/7f381795d190ff4096095e90c34a5d3a16ef... and its "passLen" function: https://github.com/dnsbase/dnsbase/blob/7f381795d190ff4096095e90c34a5d3a16ef... -- Viktor.
participants (2)
-
Viktor Dukhovni
-
Zoran Bošnjak