
On 9/20/07, Justin Bailey
Your email makes me think I should work directly with a list of strict bytestrings, but in that case what happens when I want to take a large chunk of strings out of the middle of the list? Would that be an O(n) operation?
I used a list of strict bytestrings for this task and got pretty good performance (40+k iterations per second, around 40 seconds to run endo.dna) A strict bytestring looks like this: import qualified Data.ByteString.Base as B B.PS fptr offset length (PS means Packed String) fptr :: ForeignPtr Word8 fptr holds a safe pointer an immutable block of memory allocated for a bytestring. This buffer can be shared between multiple bytestrings. offset :: Int offset is the offset from the beginning of the bytestring. length :: Int length is the length of the bytestring (in bytes) This makes the take and drop functions on bytestrings extremely fast: take n s@(B.PS fptr offset length) | n <= 0 = B.empty | n >= length = s | otherwise = B.PS fptr offset n drop n s@(B.PS fptr offset length) | n <= 0 = s | n >= length = B.empty | otherwise = B.PS fptr (offset + n) (length - n) (spoiler follows) ... ... ... ... ... ... What you will find, however, is that a list of bytestrings isn't -quite- good enough; the first thing the DNA does is create a huge number of tiny copies of a single base. You can solve this by setting a threshold number of chunks in the DNA list, and "garbage collecting" the result into a single bytestring every time the number of chunks grows too high. -- ryan