
Hi John,
I've used Haskell and GHC to solve particular real life application. 4 tools were developed and their function is almost the same - they modify textual input according to patterns found in the text. Thus, it
Hmm, modification can be a problem for ByteStrings, since it entails copying. That could be worse for strict BytStrings than lazy, if in the lazy ByteString you can reuse many chunks.
I understand now, that is probably the point.
Two main possibilities: 1. your algorithm isn't suited for ByteStrings 2. you're doing it wrong
The above indicates 1., but without a more detailed description and/or code, it's impossible to tell.
Yes, it seems that the (1) is the point, because I split and re-build the bytestream many times during processing.
For splitting it might be interesting to have a look at 'attoparsec' (http://hackage.haskell.org/package/attoparsec), which are parser combinators specialized to bytestring's. If the splitting still leaves large enough chunks of the original input intact (large enough being around > 128 bytes), then you might be able to achieve a benefit. To use strict and lazy bytestrings efficiently, it helps a lot to know their internal representation (a slice of a char[] array and lists of slices of char[] arrays) and to think about what the different operations cost on this datastructure. (The source code of the library is actually not that hard to understand.) An obviously very expensive operation is concatenation (guaranteed copying for strict bytestrings at least a list traversal for lazy bytestrings). The blaze-builder library provides a type that supports O(1) concatenation of sequences of bytes and an efficient conversion to a lazy bytestring that has a large average chunk size. Large chunks are important to amortize the work spent on the boundary with the speedup gained due to the compact and cache efficient representation. If you keep using lists, but need a lot of concatenation using difference lists (http://hackage.haskell.org/package/dlist) also allows a O(1) concatenation. Note that the "cost" of having this O(1) concatenation is that the resulting list cannot be inspected, as long as it is in a form that supports O(1) concatenation. The same holds for the lazy bytestring builder provided by blaze-builder. Note that I'd be very curious, if you could achieve any further performance improvement using blaze-builder...which is no surprise, as I'm its author :-) Note also that the API has been cleaned up and will be published with the next release of the bytestring library. So just ignore the whole 'write' stuff. I should have separated it. good luck and best regards, Simon