Re: ByteString I/O Performance

6 Sep 2007


      On 06 Sep 2007 02:30:28 +0200, Peter Simons  wrote:
...
Duncan Coutts writes:
...
What you want is just fine, but it's a mutable interface not a
pure one. We cannot provide any operations that mutate an
existing ByteString without breaking the semantics of all the
pure operations.
Is that so? How exactly does mutating a ByteString break the
semantics of the pure function 'take'?
Because if you mutate the original bytestring the value of the other
bytestring (returned from 'take') will change. Not pure. Bad. Evil.
Etc.
...
...
It's very much like the difference between the MArray and
IArray classes, for mutable and immutable arrays. One provides
index in a monad, the other is pure.
Right. Now I wonder: why does ByteString provide an immutable
interface but not a mutable one? Apparently mutable interfaces
are useful for some purposes, right? Why else would the Array
package provide one?
It doesn't provide two different interfaces to the same data
structure, it provides two different data structures. You can't have a
pure interface AND an impure one, as the impure one could then mutate
values that are used with the pure interface, which would mean that
the pure interface is broken (see above).
...
...
Bear in mind, that these cache benefits are fairly small in
real benchmarks as opposed to 'cat' on fully cached files.
Do I understand that right? It sounds as if you were saying that
-- in the general case -- allocating a new buffer for every
single read() is not significantly slower than re-using the same
buffer every time. Is that what you meant to say?
I think he said that most of the speed difference is due to better
cache performance when reusing the same buffer, but in general you do
"other stuff" as well which won't be as benign for the cache and the
difference will be smaller (if at all noticable).
...
...
ByteString certainly isn't the right abstraction for that
though.
I am sorry, but that is nonsense. A ByteString is a tuple
consisting of a pointer into raw memory, an integer signifying
the size of the front gap, and an integer signifying the length
of the payload. That data structure is near perfect for
performing efficient I/O. To say that this abstraction isn't
right for the task is absurd. What you mean to say is that you
don't _intend_ it to be used that way, which is an altogether
different thing.
A ByteString is an immutable data structure representing a string, if
you need a mutable one then it's not the right abstraction *by
definition*. Yes, a ByteString is not intended to be a mutable buffer,
which is precisely what makes it not the right abstraction if you need
that (not an "altogether different thing", it is THE thing).  The fact
that the internal representation would look similar to a different
abstraction which did allow mutation doesn't mean that *this*
abstraction is the right choice.
This is analogous to Java, and C# - if you need a mutable string
buffer the "string" class is not the right abstraction, you use the
string builder classes.

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862