On Sun, 2006-11-19 at 17:54 +0000, Claus Reinke wrote:
I noticed that ByteString is drastically slower than String if I use cons a lot. according to the source, that is expected because of the memcpy for the second parameter.
but it seems to me that construction should be able to play the dual trick to deconstruction (which does not copy the tail, but returns an indirection into the original list).
Another approach which I have considered is to do it directly by just poking into an array but then do cunning things to make it persistent at yet still O(1) in the best case of single-threaded construction. Here the representation: data StringBuilder = StringBuilder (ForeignPtr Word8) Int Int (IORef Int) -- pointer, offset, length and 'current length' So just like the ByteString representation but with an extra IORef Int. The idea is that the IORef tells us the current length of the used part of the memory block. So by comparing the length at the time this StringBuilder value was made with the real current length then we can see if we're using the 'latest' version of the StringBuilder or if it's been appended/prepended to since. If we're using the latest value then we can reserve some space by atomically incrementing the IORef and then directly write into the free space. If we're not starting from the latest value then we incur a O(n) penalty to copy the array. Of course in a sequence of cons/snoc operations to an old value the copying only happens once since now we have a new unshared array. To make this scheme efficient the locking has to be cheap or preferably someone could figure out a lockless version. This could usefully be combined with lazy bytestrings (implemented either as lists or unbalanced trees) to provide time and space efficient O(1) cons and snoc. Duncan