Re: HEAD: Deterioration in ByteString I/O

9 Sep 2010


      On Thursday 09 September 2010 01:28:04, Daniel Fischer wrote:
...
Maybe the following observation helps:
ghc-6.13.20100831 reads lazy ByteStrings in chunks of 8192 bytes.
If I understand correctly, that means (since defaultChunkSize = 32760)
- bytestring allocates a 32K buffer to be filled and asks ghc for 32760
bytes in that buffer
- ghc asks the OS for 8192 bytes (and usually gets them)
- upon receiving fewer bytes than requested, bytestring copies them to a
new smaller buffer
- since the number of bytes received is a multiple of ghc's allocation
block size (which I believe is 4K), there's no space for the bookkeeping
overhead, hence the new buffer takes up 12K instead of 8, resulting in
44K allocation for 8K bytes
That factor of 5.5 corresponds pretty well with the allocation figures
above,
That seems to be correct, but probably not the whole story.
I've played with defaultChunkSize, setting it to (64K - overhead), ghc 
still reads in 8192 byte chunks, the allocation figures are nearly double 
those for (32K - overhead). Setting it to (8K - overhead), ghc reads in 
8184 byte chunks and the allocation figures go down to approximately 1.4 
times those of 6.12.3.
Can a factor of 1.4 be explained by the smaller chunk size or is something 
else going on?
...
and the extra copying explains the approximate doubling of I/O time.
Apparently not. With the small chunk size which should avoid copying, the 
I/O didn't get faster.
...
Trying to find out why ghc asks the OS for only 8192 bytes instead of
32760 hasn't brought enlightenment yet.
No progress on that front.