ByteString I/O Performance

import System.IO import Foreign ( allocaBytes ) import qualified Data.ByteString as Str
bufsize :: Int bufsize = 4 * 1024
In order to determine I/O performance, a random 512 MB file is copied
from standard input to standard output. All test programs have been
compiled with GHC 6.6.1 using "-O2 -funbox-strict-fields" for
optimization. The time to beat for this test comes from /bin/cat:
$ dd if=/dev/urandom of=test.data bs=1M count=512
$ time /bin/cat
catBuf :: Handle -> Handle -> IO () catBuf hIn hOut = allocaBytes bufsize input where input ptr = hGetBuf hIn ptr bufsize >>= output ptr output _ 0 = return () output ptr n = hPutBuf hOut ptr n >> input ptr
real 0m2.747s 0m2.737s 0m2.758s user 0m0.524s 0m0.416s 0m0.632s sys 0m2.224s 0m2.304s 0m2.124s The second entry is implemented with ByteString:
catString :: Handle -> Handle -> IO () catString hIn hOut = Str.hGet hIn bufsize >>= loop where loop buf | Str.null buf = return () | otherwise = Str.hPut hOut buf >> catString hIn hOut
real 0m7.852s 0m7.817s 0m7.887s user 0m4.764s 0m4.800s 0m4.748s sys 0m3.080s 0m3.000s 0m3.108s When Data.ByteString.Char8 is used instead, the program produces almost identical results. Data.ByteString.Lazy, however, came out differently: real 0m8.184s 0m8.086s 0m8.067s user 0m5.104s 0m5.252s 0m4.948s sys 0m2.940s 0m2.808s 0m3.120s ByteString turns out to be more than two times slower than ordinary buffer I/O. This result comes as a surprise because ByteString _is_ an ordinary memory buffer, so it feels reasonable to expected it to perform about the same. The reason why ByteString cannot compete with hGetBuf appears to be Data.ByteString.Base.createAndTrim. That function allocates space with malloc(), reads data into that buffer, allocates a new buffer, and then copies the data it has just read from the old buffer into the new one before returning it. This approach is quite inefficient for reading large amounts of data. It is particularly odd that Data.ByteString.readFile relies on the same mechanism. The required buffer size is known in advance. There is no point in reading data into a temporary buffer. I may have misread the implementation, but my impression is that readFile currently requires 2*n bytes of memory to read a file of size n. It feels like there is plenty of room for optimization. :-)
main :: IO () main = do mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ] catString stdin stdout

On Thu, 30 Aug 2007, Bulat Ziganshin wrote:
Thursday, August 30, 2007, 12:05:02 AM, you wrote:
bufsize = 4 * 1024
i recommend you to try larger buffer. on my hardware, 256k was optimal
Small buffer performance is important for dealing with many concurrent
low-bandwidth network connections.
Tony.
--
f.a.n.finch

simons:
import System.IO import Foreign ( allocaBytes ) import qualified Data.ByteString as Str
bufsize :: Int bufsize = 4 * 1024
In order to determine I/O performance, a random 512 MB file is copied from standard input to standard output. All test programs have been compiled with GHC 6.6.1 using "-O2 -funbox-strict-fields" for optimization. The time to beat for this test comes from /bin/cat:
$ dd if=/dev/urandom of=test.data bs=1M count=512 $ time /bin/cat
/dev/null real 0m2.097s 0m2.135s 0m2.100s user 0m0.036s 0m0.028s 0m0.024s sys 0m2.060s 0m2.108s 0m2.076s
Thanks for the data points. It's been a while since I benchmarked the IO performance, so looks like time to revisit this issue. -- Don

Donald Bruce Stewart writes:
It's been a while since I benchmarked the IO performance, so looks like time to revisit this issue.
Hi Bruce, my impression is that performance cannot be improved significantly without providing a modified input API. For some purposes reading input in large chunks is fine, but for some purposes it is not. A network proxy server, for example, cannot afford to buffer large amounts of data for every connection. An I/O buffer size of, say 256KB, would be really not a good choice for such a program. Something like 4KB typically is, but the small buffer size means that large amounts of data will be read using a fairly large number of hGet calls. As it is, hGet performs at least one malloc() per read() call. That will be slow, no matter how optimized that code is. One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut. A different approach would be to try to reduce the cost for malloc() by using some sort of pre-allocated pool of ByteStrings behind the scenes. Last but not least, it's also possible to decide and document that ByteStrings are not supposed to be used for those kinds of purposes and that users who need very high performance should rely on hGetBuf instead. I can't say what's best, those are simply the options I see. With kind regards, Peter

Peter Simons wrote:
One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut.
This is already quite easy to do. See unsafeUseAsCStringLen in Data.ByteString.Base, and hGetBuf in System.IO.

-----Original Message----- From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Bryan O'Sullivan Sent: Sunday, September 02, 2007 11:23 PM To: Peter Simons Cc: libraries@haskell.org Subject: Re: ByteString I/O Performance Peter Simons wrote:
One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut.
This is already quite easy to do. See unsafeUseAsCStringLen in Data.ByteString.Base, and hGetBuf in System.IO. Is it possible without resorting to an unsafeXXX function? http://www.haskell.org/mailman/listinfo/libraries

seth:
-----Original Message----- From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Bryan O'Sullivan Sent: Sunday, September 02, 2007 11:23 PM To: Peter Simons Cc: libraries@haskell.org Subject: Re: ByteString I/O Performance
Peter Simons wrote:
One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut.
This is already quite easy to do. See unsafeUseAsCStringLen in Data.ByteString.Base, and hGetBuf in System.IO.
Is it possible without resorting to an unsafeXXX function?
They're all 'unsafe' for different reasons :) The question should be: why is this unsafe? (It's unsafe because it doesn't copy the C string, so you need to have a side condition that the string isn't modified by C). -- Don

-----Original Message----- From: Donald Bruce Stewart [mailto:dons@cse.unsw.edu.au] Sent: Monday, September 03, 2007 12:03 AM To: Seth Kurtzberg Cc: libraries@haskell.org Subject: Re: ByteString I/O Performance seth:
-----Original Message----- From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org]
On Behalf Of Bryan O'Sullivan
Sent: Sunday, September 02, 2007 11:23 PM To: Peter Simons Cc: libraries@haskell.org Subject: Re: ByteString I/O Performance
Peter Simons wrote:
One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut.
This is already quite easy to do. See unsafeUseAsCStringLen in Data.ByteString.Base, and hGetBuf in System.IO.
Is it possible without resorting to an unsafeXXX function?
They're all 'unsafe' for different reasons :) The question should be: why is this unsafe? (It's unsafe because it doesn't copy the C string, so you need to have a side condition that the string isn't modified by C). OK. Assume that I'm not doing any C coding, so that the only C code that is invoked is called from within the implementation (in this case the implementation of System.IO). Can I assume that no implementation code modifies the string? In other words, is it valid to assume that the side condition is never violated so long as I don't violate the side condition in my own C code (if any)? -- Don

seth:
-----Original Message----- From: Donald Bruce Stewart [mailto:dons@cse.unsw.edu.au] Sent: Monday, September 03, 2007 12:03 AM To: Seth Kurtzberg Cc: libraries@haskell.org Subject: Re: ByteString I/O Performance
seth:
-----Original Message----- From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org]
On Behalf Of Bryan O'Sullivan
Sent: Sunday, September 02, 2007 11:23 PM To: Peter Simons Cc: libraries@haskell.org Subject: Re: ByteString I/O Performance
Peter Simons wrote:
One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut.
This is already quite easy to do. See unsafeUseAsCStringLen in Data.ByteString.Base, and hGetBuf in System.IO.
Is it possible without resorting to an unsafeXXX function?
They're all 'unsafe' for different reasons :) The question should be: why is this unsafe?
(It's unsafe because it doesn't copy the C string, so you need to have a side condition that the string isn't modified by C).
OK. Assume that I'm not doing any C coding, so that the only C code that is invoked is called from within the implementation (in this case the implementation of System.IO). Can I assume that no implementation code modifies the string? In other words, is it valid to assume that the side condition is never violated so long as I don't violate the side condition in my own C code (if any)?
Yes, that's right. Its safe as long as you don't modify the string yourself in C. -- Don

On Mon, Sep 03, 2007 at 01:40:51AM +0200, Peter Simons wrote:
One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut.
This seems dangerous. For example, consider that the ByteString can be referenced by some lazy computation, expecting it to contain the data from some earlier hGet.
A different approach would be to try to reduce the cost for malloc() by using some sort of pre-allocated pool of ByteStrings behind the scenes.
I just wrote this, before I read you proposition: As safer alternative would be to keep a cache of pre-malloced buffers, populated by the ByteString finalizer. But the bookkeeping cost could outweight the benefit of avoiding malloc. Best regards Tomek

On Mon, Sep 03, 2007 at 07:11:45AM +0200, Tomasz Zielonka wrote:
On Mon, Sep 03, 2007 at 01:40:51AM +0200, Peter Simons wrote:
One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut.
This seems dangerous. For example, consider that the ByteString can be referenced by some lazy computation, expecting it to contain the data from some earlier hGet.
A different approach would be to try to reduce the cost for malloc() by using some sort of pre-allocated pool of ByteStrings behind the scenes.
I just wrote this, before I read you proposition: As safer alternative would be to keep a cache of pre-malloced buffers, populated by the ByteString finalizer. But the bookkeeping cost could outweight the benefit of avoiding malloc.
How are you getting these bytestrings? Normal bytestring allocation doesn't use malloc and doesn't use finalizers; it calls the (deceptively named) mallocForeignPtrBytes function, which allocates a block of data in the pinned GHC heap, nearly like any other Haskell object. Stefan

On Sun, Sep 02, 2007 at 10:16:59PM -0700, Stefan O'Rear wrote:
On Mon, Sep 03, 2007 at 07:11:45AM +0200, Tomasz Zielonka wrote:
I just wrote this, before I read you proposition: As safer alternative would be to keep a cache of pre-malloced buffers, populated by the ByteString finalizer. But the bookkeeping cost could outweight the benefit of avoiding malloc.
How are you getting these bytestrings? Normal bytestring allocation doesn't use malloc and doesn't use finalizers; it calls the (deceptively named) mallocForeignPtrBytes function, which allocates a block of data in the pinned GHC heap, nearly like any other Haskell object.
IIUC, the discussion was about malloced ByteStrings... Best regards Tomek

On Mon, 2007-09-03 at 01:40 +0200, Peter Simons wrote:
Donald Bruce Stewart writes:
It's been a while since I benchmarked the IO performance, so looks like time to revisit this issue.
Hi Bruce,
my impression is that performance cannot be improved significantly without providing a modified input API. For some purposes reading input in large chunks is fine, but for some purposes it is not. A network proxy server, for example, cannot afford to buffer large amounts of data for every connection. An I/O buffer size of, say 256KB, would be really not a good choice for such a program. Something like 4KB typically is, but the small buffer size means that large amounts of data will be read using a fairly large number of hGet calls. As it is, hGet performs at least one malloc() per read() call. That will be slow, no matter how optimized that code is.
One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut.
Stefan is right, the only place that C malloc() is used is in strict ByteString's hGetContents. Everywhere else we use GHC's pinned heap allocation. Strict ByteString ReadFile does not use hGetContents because in that case we know the file length, it uses hGetBuf. Also, createAndTrim does not do any copying when there is no trimming necessary, so in the best case we do only a single copy using hGetBuf. As I recall from when I profiled this for the ByteString paper, with a lazy ByteString implementation of unix 'cat' on disk files (rather than a network socket) we should only copy each chunk once (as far as I can see). The slowdown compared to a simple hGetBuf 'cat' was all down to cache locality, because we're cycling between a range of buffers rather than a single cache-hot buffer. The time overhead of memory allocation and GC is negligible. In the tests I did at the time I found that on fully cached files the slowdown compared to using a single mutable buffer was about 2-3x. I figured that overhead is not bad, considering it represents the worst case when no work is being done to transform the data in any way and we're not doing any real IO, just copying data from kernel memory to user space memory. If we're doing lots of short reads (like when reading from sockets) then there is an opportunity for improvement. We could read into an internal buffer and cut off as an immutable ByteString only the chunk that got filled. The remainder of the buffer could be used for the following reads until the whole buffer is exhausted and a new buffer has to be allocated. This is the buffering strategy we use in the binary library when serialising. Duncan

Duncan Coutts writes: | As I recall from when I profiled this for the ByteString paper | [...], the slowdown compared to a simple hGetBuf 'cat' was all | down to cache locality, because we're cycling between a range | of buffers rather than a single cache-hot buffer. I believe you are right. The following implementation performs just fine:
import System.IO import qualified Data.ByteString.Base as Str import qualified Data.ByteString as Str import Data.ByteString ( ByteString )
bufsize :: Int bufsize = 4 * 1024
hGet :: Handle -> ByteString -> IO ByteString hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) return (Str.unsafeTake i buf)
catString :: Handle -> Handle -> IO () catString hIn hOut = Str.create bufsize (\_ -> return ()) >>= input where input buf = hGet hIn buf >>= output buf output buf b | Str.null b = return () | otherwise = Str.hPut hOut b >> input buf
main :: IO () main = do mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ] catString stdin stdout
time /bin/cat

simons:
Duncan Coutts writes:
| As I recall from when I profiled this for the ByteString paper | [...], the slowdown compared to a simple hGetBuf 'cat' was all | down to cache locality, because we're cycling between a range | of buffers rather than a single cache-hot buffer.
I believe you are right. The following implementation performs just fine:
import System.IO import qualified Data.ByteString.Base as Str import qualified Data.ByteString as Str import Data.ByteString ( ByteString )
bufsize :: Int bufsize = 4 * 1024
hGet :: Handle -> ByteString -> IO ByteString hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) return (Str.unsafeTake i buf)
catString :: Handle -> Handle -> IO () catString hIn hOut = Str.create bufsize (\_ -> return ()) >>= input where input buf = hGet hIn buf >>= output buf output buf b | Str.null b = return () | otherwise = Str.hPut hOut b >> input buf
main :: IO () main = do mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ] catString stdin stdout
time /bin/cat
/dev/null real 0m2.093s user 0m0.024s sys 0m2.068s
time ./cat-bytestring
/dev/null real 0m2.753s user 0m0.568s sys 0m2.184s
That's a useful benchmark. Thanks for looking into this. -- Don

Donald Bruce Stewart writes:
hGet :: Handle -> ByteString -> IO ByteString hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) return (Str.unsafeTake i buf)
That's a useful benchmark. Thanks for looking into this.
It was a pleasure. How do you feel about providing this kind of input API to the user as part of the module? Note that the current hGet can be implemented on top of the combinator above, but the reverse is not true. So I guess that this API is strictly more powerful than the one we currently have. And for some use cases, it's significantly faster too. By the way, did you receive my e-mail about the race condition in readFile? Best regards, Peter

On Tue, 2007-09-04 at 20:07 +0200, Peter Simons wrote:
Donald Bruce Stewart writes:
hGet :: Handle -> ByteString -> IO ByteString hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) return (Str.unsafeTake i buf)
That's a useful benchmark. Thanks for looking into this.
It was a pleasure. How do you feel about providing this kind of input API to the user as part of the module?
We can't provide this interface as it modifies the immutable input ByteString. What you're looking for is to copy into a mutable buffer. That buffer cannot be itself a ByteString as they're immutable. The best we can do is to copy into a newly allocated buffer as we do in hGet. In the best case we can do this with a single copy. We could look at the trimming again if that's an issue. So if you want an api that provides a mutable input buffer, you'll need something other than a ByteString. We do have something like this in the binary package, which we've been considering cleaning up and bringing into the bytestring package. However even that doesn't give you a mutable buffer that you'd want for fast 'cat', that really requires an api that guarantees that the buffer is no longer in use when the next chunk is read, destroying the previous content of the buffer.
By the way, did you receive my e-mail about the race condition in readFile?
Yes, it's a good point. We could fix it my doing more reads in the same style a hGetContents. Duncan

Duncan Coutts writes:
We can't provide this interface as it modifies the immutable input ByteString.
Well, one could call it unsafeHGet. :-)
[Mutable input buffer] really requires an api that guarantees that the buffer is no longer in use when the next chunk is read, destroying the previous content of the buffer.
I see your point. It would be nice if the API would guarantee that the ByteString cannot be misused. Personally, I feel that is a minor point though. When you read data into a buffer, then the previous contents of that buffer is lost. That is hardly a surprise and ByteString offers functions like 'copy' which allow the user to design his algorithms correctly. Programmers who don't want to face that problem can always use the hGet variant that creates a new buffer every time. Anyway, it's not a big thing. The modified hGet function is simple enough so that those who want it can write it themselves. Thank you for your time, Peter

Peter Simons wrote:
Duncan Coutts writes:
We can't provide this interface as it modifies the immutable input ByteString.
Well, one could call it unsafeHGet. :-)
[Mutable input buffer] really requires an api that guarantees that the buffer is no longer in use when the next chunk is read, destroying the previous content of the buffer.
I see your point. It would be nice if the API would guarantee that the ByteString cannot be misused. Personally, I feel that is a minor point though.
Remember that Haskell expressions are evaluated lazily, that's why we have the IO monad for doing input/output. Hence, mutable values that look like pure ones become unpredictable and are considered a major sin in Haskell land, please don't do it. As catBuf crucially depends on the mutability of the buffer, ByteStrings are not the right data structure to use in that case, that's all there is to it. Regards, apfelmus

apfelmus writes:
Remember that Haskell expressions are evaluated lazily, that's why we have the IO monad for doing input/output.
I see. Thank you for the clarification.
Hence, mutable values that look like pure ones become unpredictable and are considered a major sin in Haskell land, please don't do it.
It feels patronizing to tell someone else what he should or shouldn't do. What can I say? Outside of Haskell land there are people who believe that software should, like, work, instead of falling apart whenever you feed it input data larger than a few kilobytes and to reach that objective those people are absolutely prepared to face the wild unpredictability of -- *gasp* -- pointers!
As catBuf crucially depends on the mutability of the buffer, ByteStrings are not the right data structure to use in that case, that's all there is to it.
A ByteString is a pointer, a byte size, and a byte offset. As such, it is the perfect data structure for a program like catBuf. Let's agree to disagree. Best regards, Peter

Peter Simons wrote:
Hence, mutable values that look like pure ones become unpredictable and are considered a major sin in Haskell land, please don't do it.
It feels patronizing to tell someone else what he should or shouldn't do. What can I say? Outside of Haskell land there are people who believe that software should, like, work, instead of falling apart whenever you feed it input data larger than a few kilobytes and to reach that objective those people are absolutely prepared to face the wild unpredictability of -- *gasp* -- pointers!
I didn't intend to patronize, I apologize for the harsh words. It's just that there's a difference between manipulating pointers peek :: Ptr Word8 -> IO Word8 -- :) poke :: Word8 -> Ptr Word8 -> IO () and breaking language semantics peek :: Ptr Word8 -> Word8 -- :( poke :: Word8 -> Ptr Word8 -> ()
As catBuf crucially depends on the mutability of the buffer, ByteStrings are not the right data structure to use in that case, that's all there is to it.
A ByteString is a pointer, a byte size, and a byte offset. As such, it is the perfect data structure for a program like catBuf.
Not quite. ByteStrings are intended to be a memory-efficient representation of Strings and the memory efficiency is implemented in Haskell with buffers and unsafePeformIO. But great care is taken to preserve language semantics in the exported API which means that ByteStrings have to be immutable. Note that the copy function is not for assuring immutability but for handling possible space leaks. Regards, apfelmus

Hey Apfelmus, I have to apologize for being overly sensitive. I had a couple of rough days and am easily frustrated at the moment. That is not your fault. I am sorry. Your illustrate example is very good. It helped me to see more clearly the point I've been trying to make but couldn't quite articulate.
peek :: Ptr Word8 -> Word8 -- :(
The ByteString package offers a number of pure functions to manipulate the underlying buffer. Some of them -- like 'take' and 'drop' -- are by all means supposed to be pure, because they manipulate merely the base pointer, the offset, or the size. Those function depend only on the value of ByteString, not on the memory it references. Then there is this function: index :: ByteString -> Int -> Word8 This function does earn one of those inverse smilies. Personally, I would not have provided a dereferencing operation outside of the IO monad. My personal opinion is that an monadic 'index' would have been ever so slightly less convenient, but it would be far more robust than the function above. As far as I can tell, the only reason why a function like 'unsafeUseAsCStringLen' has to be dubbed unsafe is because 'index' makes it unsafe. The limitation that ByteString has to be immutable is a consequence of the choice to provide 'index' as a pure function. Personally, I won't use 'index' in my code. I'll happily dereference the pointer in the IO monad, because I've found that to be no effort whatsoever. I love monads. For my purposes, 'unsafeUseAsCStringLen' is a perfectly safe function. The efficient variant of 'hGet' I posted can be implemented on top of it, so that 'hGet' is by all means a safe function in my code. There really is no risk at all, unless one uses 'index' or something that's based on it. The way I see it, there will be other people who'll find the performance limitations of standard 'hGet' a decisive factor in their design decisions. Chances are, those people will wonder about using the base pointer for hGetBuf and then they'll end up re-inventing the wheel we just came up with. Maybe I'll find the time to submit a patch to the documentation, so that fine points like an optimal buffer size etc. are explained in more detail than they are right now. It would be nice if some kind of result would come out of this discussion. Anyway, thank you. I appreciate everyone's efforts in helping me figure out why I/O with ByteString is more than two times slower than it could be. Take care, Peter

On Wed, 2007-09-05 at 21:30 +0200, Peter Simons wrote:
As far as I can tell, the only reason why a function like 'unsafeUseAsCStringLen' has to be dubbed unsafe is because 'index' makes it unsafe. The limitation that ByteString has to be immutable is a consequence of the choice to provide 'index' as a pure function.
Well, it's not just index, all the functions that get data from the ByteString, like head/tail/uncons etc etc are pure. That is the whole point of the design of ByteString, to provide pure/immutable high performance strings. What you want is just fine, but it's a mutable interface not a pure one. We cannot provide any operations that mutate an existing ByteString without breaking the semantics of all the pure operations. It's very much like the difference between the MArray and IArray classes, for mutable and immutable arrays. One provides index in a monad, the other is pure.
Personally, I won't use 'index' in my code. I'll happily dereference the pointer in the IO monad, because I've found that to be no effort whatsoever. I love monads. For my purposes, 'unsafeUseAsCStringLen' is a perfectly safe function. The efficient variant of 'hGet' I posted can be implemented on top of it, so that 'hGet' is by all means a safe function in my code. There really is no risk at all, unless one uses 'index' or something that's based on it.
Right, or if you were to hand out a ByteString and then change the contents of it when nobody is looking then that's very much unsafe. So the point is you can break the semantics locally and nobody will notice. It's not a technique we should encourage however.
The way I see it, there will be other people who'll find the performance limitations of standard 'hGet' a decisive factor in their design decisions. Chances are, those people will wonder about using the base pointer for hGetBuf and then they'll end up re-inventing the wheel we just came up with.
I'd rather not provide a quick easy way to break the semantics. unsafeUseAsCStringLen and friends are already plenty enough rope...
Maybe I'll find the time to submit a patch to the documentation, so that fine points like an optimal buffer size etc. are explained in more detail than they are right now. It would be nice if some kind of result would come out of this discussion.
I really don't think we can provide anything that copies into an existing pre-allocated ByteString. As far as I can see, the best we can do is to allocate a fresh buffer and do a single copy into that. Mutating an existing buffer is fine, and System.IO already provides hGetBuf. But you have to be really really careful if you create a ByteString based on the contents of that mutable buffer, without making any copy first.
Anyway, thank you. I appreciate everyone's efforts in helping me figure out why I/O with ByteString is more than two times slower than it could be.
Thanks very much for pointing out where we are copying more than necessary. As for the last bit of performance difference due to the cache benefits of reusing a mutable buffer rather than allocating and GCing a range of buffer, I can't see any way within the existing design how we can achieve that. Bear in mind, that these cache benefits are fairly small in real benchmarks as opposed to 'cat' on fully cached files. Usually you do some actual IO and some operation on the data rather than just copying it from one file descriptor to another. For example, my lazy bytestring binding to iconv performs exactly the same as the command line iconv. In that case we are doing a bit of work on the data which swamps the cache benefits that the command line iconv prog gets from using mutable buffers. If we are trying to optimise the 'cat' case however, eg for network servers, there are even lower level things we can do so that no copies of the data have to be made at all. eg mmap or linux's copyfile or splice. ByteString certainly isn't the right abstraction for that though. Duncan

Duncan Coutts writes:
What you want is just fine, but it's a mutable interface not a pure one. We cannot provide any operations that mutate an existing ByteString without breaking the semantics of all the pure operations.
Is that so? How exactly does mutating a ByteString break the semantics of the pure function 'take'?
It's very much like the difference between the MArray and IArray classes, for mutable and immutable arrays. One provides index in a monad, the other is pure.
Right. Now I wonder: why does ByteString provide an immutable interface but not a mutable one? Apparently mutable interfaces are useful for some purposes, right? Why else would the Array package provide one?
Personally, I won't use 'index' in my code. I'll happily dereference the pointer in the IO monad [...]. For my purposes, 'unsafeUseAsCStringLen' is a perfectly safe function.
[So you can] break the semantics locally and nobody will notice. It's not a technique we should encourage however.
Why do you keep saying that I would break semantics? The code that breaks semantics is the one that uses unsafePerformIO, and ByteString does that, not my code. When I refrain from using those parts of Data.ByteString, then my use of the underlying pointer doesn't "break semantics locally", it doesn't break them at all.
Thanks very much for pointing out where we are copying more than necessary.
You are welcome. And please don't forget the race condition in readFile. I feel that problem is actually more significant than the one we're discussing right now, but my impression is not that the report would have been taken particularly seriously.
As for the last bit of performance difference due to the cache benefits of reusing a mutable buffer rather than allocating and GCing a range of buffer, I can't see any way within the existing design how we can achieve that.
Exactly, with the current API that problem cannot be solved. It's a shame.
Bear in mind, that these cache benefits are fairly small in real benchmarks as opposed to 'cat' on fully cached files.
Do I understand that right? It sounds as if you were saying that -- in the general case -- allocating a new buffer for every single read() is not significantly slower than re-using the same buffer every time. Is that what you meant to say?
If we are trying to optimise the 'cat' case however, eg for network servers, there are even lower level things we can do so that no copies of the data have to be made at all. eg mmap or linux's copyfile or splice.
Yes, that's true, but none of these functions is remotely portable -- or even available in any of the Haskell implementations. So these things concern me considerably less than the 'hGet' function that _is_ available. Or rather, the one that apparently won't be available because we Haskell are way too clever for efficient software.
ByteString certainly isn't the right abstraction for that though.
I am sorry, but that is nonsense. A ByteString is a tuple consisting of a pointer into raw memory, an integer signifying the size of the front gap, and an integer signifying the length of the payload. That data structure is near perfect for performing efficient I/O. To say that this abstraction isn't right for the task is absurd. What you mean to say is that you don't _intend_ it to be used that way, which is an altogether different thing. Best regards, Peter

On 06 Sep 2007 02:30:28 +0200, Peter Simons
Duncan Coutts writes:
What you want is just fine, but it's a mutable interface not a pure one. We cannot provide any operations that mutate an existing ByteString without breaking the semantics of all the pure operations.
Is that so? How exactly does mutating a ByteString break the semantics of the pure function 'take'?
Because if you mutate the original bytestring the value of the other bytestring (returned from 'take') will change. Not pure. Bad. Evil. Etc.
It's very much like the difference between the MArray and IArray classes, for mutable and immutable arrays. One provides index in a monad, the other is pure.
Right. Now I wonder: why does ByteString provide an immutable interface but not a mutable one? Apparently mutable interfaces are useful for some purposes, right? Why else would the Array package provide one?
It doesn't provide two different interfaces to the same data structure, it provides two different data structures. You can't have a pure interface AND an impure one, as the impure one could then mutate values that are used with the pure interface, which would mean that the pure interface is broken (see above).
Bear in mind, that these cache benefits are fairly small in real benchmarks as opposed to 'cat' on fully cached files.
Do I understand that right? It sounds as if you were saying that -- in the general case -- allocating a new buffer for every single read() is not significantly slower than re-using the same buffer every time. Is that what you meant to say?
I think he said that most of the speed difference is due to better cache performance when reusing the same buffer, but in general you do "other stuff" as well which won't be as benign for the cache and the difference will be smaller (if at all noticable).
ByteString certainly isn't the right abstraction for that though.
I am sorry, but that is nonsense. A ByteString is a tuple consisting of a pointer into raw memory, an integer signifying the size of the front gap, and an integer signifying the length of the payload. That data structure is near perfect for performing efficient I/O. To say that this abstraction isn't right for the task is absurd. What you mean to say is that you don't _intend_ it to be used that way, which is an altogether different thing.
A ByteString is an immutable data structure representing a string, if you need a mutable one then it's not the right abstraction *by definition*. Yes, a ByteString is not intended to be a mutable buffer, which is precisely what makes it not the right abstraction if you need that (not an "altogether different thing", it is THE thing). The fact that the internal representation would look similar to a different abstraction which did allow mutation doesn't mean that *this* abstraction is the right choice. This is analogous to Java, and C# - if you need a mutable string buffer the "string" class is not the right abstraction, you use the string builder classes. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Peter Simons wrote:
Right. Now I wonder: why does ByteString provide an immutable interface but not a mutable one? Apparently mutable interfaces are useful for some purposes, right?
I understand from your earlier admission that you have been having some bad days recently, and I look forward to you kindly not sharing the fruits of said bad days with the list. Regards,

Bryan O'Sullivan writes:
I understand from your earlier admission that you have been having some bad days recently, and I look forward to you kindly not sharing the fruits of said bad days with the list.
Bryan, my mood would be a lot better if you would address my technical points on this list instead of my mood. I realize this discussion is going nowhere. It's hard for me to understand how people manage to say things like "you can't do that" despite the fact that I posted code which does exactly that, but I guess I don't have to understand everything. Take care everyone, Peter

Peter Simons wrote:
Bryan, my mood would be a lot better if you would address my technical points on this list instead of my mood.
I find that people are more responsive when not being flamed, and when they get the sense that their helpful words aren't falling on deaf ears.
I realize this discussion is going nowhere. It's hard for me to understand how people manage to say things like "you can't do that" despite the fact that I posted code which does exactly that, but I guess I don't have to understand everything.
I think the manner in which people are talking past each other revolves around what's *possible* versus what's *sensible*. It is perfectly true that you can take an existing ByteString and smoosh its innards however you like, because the authors sensibly made this possible. However, doing so breaks referential transparency, so it's not encouraged as a general principle. The fact that a ByteString has a pointer to a nice flat piece of memory inside is an implementation detail, and doesn't change the fact that it's intended to be immutable. If you personally know for sure that you're not going to accidentally screw yourself by running a lazy computation on a ByteString that you then modify while the computation is still thunked, then by all means, go to town. But any claim that this is generally a safe thing to do would be completely wrong. It's very much in the realm of "here's the gun, there's your foot, be careful".

bos:
It is perfectly true that you can take an existing ByteString and smoosh its innards however you like, because the authors sensibly made this possible. However, doing so breaks referential transparency, so it's not encouraged as a general principle. The fact that a ByteString has a pointer to a nice flat piece of memory inside is an implementation detail, and doesn't change the fact that it's intended to be immutable.
If you personally know for sure that you're not going to accidentally screw yourself by running a lazy computation on a ByteString that you then modify while the computation is still thunked, then by all means, go to town. But any claim that this is generally a safe thing to do would be completely wrong. It's very much in the realm of "here's the gun, there's your foot, be careful".
Perhaps Peter just wishes to define a newtype MutableByteString type for mutable use, with much the same representation as the existing immutable ByteString, but without the purity guarantees and with all operations in IO? -- Don

Bryan O'Sullivan writes:
I find that people are more responsive when not being flamed, and when they get the sense that their helpful words aren't falling on deaf ears.
Bryan, I appreciate that you are trying to help. I apologize if I have left the impression that I wouldn't. E-mail is an imperfect medium. It is hard for human beings to communicate when they cannot see each other or hear each other's voices. With a little luck, we'll bump into each other on a conference or whatever other occasion, and then I'd love to chat and have a beer with you. You may find that there is a bit of humor in some of the remarks you may now perceive as offensive because we don't know each other.
It is perfectly true that you can take an existing ByteString and smoosh its innards however you like, because the authors sensibly made this possible. However, doing so breaks referential transparency, so it's not encouraged as a general principle.
Yes, that is true.
It's very much in the realm of "here's the gun, there's your foot, be careful".
Personally, I feel guns are really bad news and I would not want to have any around me in real life. That is because a mishap with a gun can kill someone. When it comes to developing software, however, the risks I perceive are slightly less severe, so I feel more adventurous. That is my personal decision, so naturally I respect that others are free to choose differently. Take care, Peter

Peter Simons wrote:
Is that so? How exactly does mutating a ByteString break the semantics of the pure function 'take'?
http://www.sgi.com/tech/stl/Rope.html That's roughly the same in a "practical" language. Please read and understand it completely before responding.
A ByteString is a tuple consisting of a pointer into raw memory, an integer signifying the size of the front gap, and an integer signifying the length of the payload.
No, it's not. You're confusing abstract data types and their representation. If you want to work with a tuple of a pointer and two integers, by all means use a tuple of a pointer and two integers. -Udo
participants (13)
-
apfelmus
-
Bryan O'Sullivan
-
Bulat Ziganshin
-
Don Stewart
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
Peter Simons
-
Sebastian Sylvan
-
Seth Kurtzberg
-
Stefan O'Rear
-
Tomasz Zielonka
-
Tony Finch
-
Udo Stenzel