
I don't have a horse in this race; but I am curious as to why you wouldn't ask for `chunkOverhead = 16' as that seems to be your intent as well as what the expression works out to on any machine in common use.
To avoid copying data when perform FFI calls to common cipher routines (such operations usually work on 128 bit blocks). If you have a Haskell program performing full disk encryption (FDE) then its reasonable to expect large amounts of data to need encrypted/decrypted. Reading in Lazy ByteStrings you get 32k - chunkOverhead sized strict bytestrings, which is a 64 bit multiple on 32 bit machines. IOW, for an operation like "cbc key iv lazyBS" you will 1) encrypt 32K-16B 2) copy the remainder (8 bytes) and the next chunk (32K - 8B) into a new strict bytestring 3) encrypt the full 32K chunk 4) repeat. There are other ways to do it, but the fastest ways involve making your delicate and extremely security sensitive cipher algorithm work on partial blocks or build the notion of linked lists of buffers (lazy byte strings) into the implementation (which is often in C). Unfortunately, this problem only gets worse as you expand your scope. Hash algorithms have a much wider array of block sizes (512 to 1024 bits are very common) and we don't want to waste 1024 - 64 bits per 32KB chunk, so I didn't request that. In situations where people know they'll be hashing large files and explicitly use Lazy ByteStrings they could use hGetN to set the chunk size to something agreeable. A less programmer-intensive solution would be to have chunks at a full 32K. I'm not sure how much of a performance problem this would introduce (to all users of bytestrings) due to caching (other issues?). Did anyone measured it when the initial implementation decided to do it this way? Cheers, Thomas