
On Sun, 2006-05-28 at 20:40 +0400, Bulat Ziganshin wrote:
Hello Duncan,
Sunday, May 28, 2006, 3:05:53 PM, you wrote:
createMemBuf does exactly this :)
One of the areas where we found that Data.ByteString.Lazy was performing better than the ordinary Data.ByteString is cases like this where we do not know beforehand how big the buffer will be.
i like your idea of using ByteString.Lazy to implement fast and easy-to-use i/o, although i don't think that speed will be in 10% of C :)
Actually Donald recently posted a benchmark (to the libraries mailing list) of ByteString.Lazy where we were getting within 6% of C. That was on a 10GB file.
ghc by itself generates code that is several times slower than gcc-generated and you can't do anything agaist this, except for implementing everything in C.
ByteString does use C code in places and ByteString.Lazy inherits the benefits of that. Both modules also use array fusion to combine pipelines of loops into a single loop. This has big performance benefits. This is not something you can easily do in C. Using fusion also mean one doesn't have to allocate so many buffers and some transformations can work in-place on intermediate buffers. You might be able to use similar fusion techniques for layering Streams.
but, nevertheless, i think that this is a great idea - much faster than String-based hGetContents. it should help in numerous programs that need fast-and-dirty text processing, although it needs further development of library in order to implement for LazyByteString full String-like interface
Data.ByteString.Lazy implements more or less the same interface as Data.ByteString which in turn implements almost the same interface as Data.List. We're still working on improving the API.
If you have to use a single contiguous buffer then it involves guessing and possible reallocation. With a 'chunked' representation like ByteString.Lazy it's not a problem as we just allocate another chunk and start to fill that.
Obvious example include concat and getContents.
Would the same make sense for a MemBuf stream? Why does it need to be a single large buffer? Couldn't it be a list of buffers?
i also had this idea and it can be implemented in 1 day, i think (when someone will need this). but this is not for Jeremy, he need a contiguous buffer for interfacing with DBD.
The approach we're taking for Data.ByteString.Lazy is that when a contiguous buffer is needed (eg for passing to foreign code) that we convert it to an ordinary strict Data.ByteString.
btw, it's better to use UArray instead of list
Not if you want to generate or consume the stream lazily. Duncan