
In message
On 9/20/07, Duncan Coutts
wrote: A lazy bytestring is a list of strict bytestring which externally looks like one big string. Could you not just use a lazy bytestring and it's take and drop functions? Perhaps you can help me understand what it is you're trying to do?
I'm working on the ICFP contest from this year, and the algorithm frequently prepends long strings to the front of the "DNA" string being processed. I originally worked only with a lazy bytestring but it 'append' wasn't fast enough, so I'm trying this representation.
But you do realise it's exactly the same representation. Append for a lazy bytestring is O(n) in the number of chunks n, this will also be true for your 'new' representation.
Your email makes me think I should work directly with a list of strict bytestrings,
That's exactly what a lazy bytestring is. You'll not get any performance improvements without changing the data representation. A list is not good enough for what you want to do because so many operations are O(n) in the number of chunks.
but in that case what happens when I want to take a large chunk of strings out of the middle of the list? Would that be an O(n) operation?
Yes. That's exactly the problem. What you want rather than a list of strict bytestrings is a tree of strict bytestrings. You want a fingertree of strict bytestrings: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/fingertree newtype ByteSequence = BS (FingerTree (Sum Int) Strict.ByteString) instance Measured (Sum Int) Strict.ByteString where measure = Sum . Strict.length You'll have to wrap the operations you need, (like split, take, drop and append) to make the ByteSequence look like a single sequence of bytes rather than a sequence of chunks. You probably want to enforce an invariant that no chunk is ever empty (as we do for lazy bytestrings). For best performance over a large number of inserts and deletes you might need to implement merging adjacent small blocks so that the block size does not degrade too much. An alternative structure if you tend to do lots of inserts and deletes at near the same spot is a zipper structure with a cursor. I'm not so sure what the best structure for that might be, perhaps just a pair of finger trees giving the parts of the sequence before and after the insertion point (since finger trees give fast access to the ends but slower O(log n) access to points n chunks from the closer end). Have fun :-) I should point out that other people who did this year's ICFP contest have also looked at structures like this (though mostly after the contest finished), so you might want to talk or collaborate with them. Duncan