
Bryan O'Sullivan
Duncan Coutts wrote:
So perhaps that's my straw-man proposal: * change BS.findSubstring to be :: BS -> BS -> (BS, BS) in the style of List.break * remove the current BS.findSubstrings
While List.break is useful, it has the equally useful counterpart (dropWhile . not . (==)) that doesn't accumulate the prefix of a match. For a long sequence, this has appeal. Let's say you're reading ten gigabytes of data over the network, so you have no control over the incoming chunk size (as we don't provide a rechunking mechanism at present, so this isn't a hypothetical issue). A findSubstring that accumulates the prefix could easily cons an fatally large number of chunks.
I'm not saying that the signature you suggest shouldn't be present, merely that it's not enough: it wants a counterpart that accumulates either nothing or something safe like an Int64 that counts the length of the prefix.
I'm not familiar with Bytestrings so I'm probably out of my depth, but something that strikes me is that if you are returned an index to a large object like this, to use it as the offset it would have to be the offset from the beginning of the large object, which would cause the large object to be held in memory until the indexing/dropping expression is evaluated. Or is there some more sophisticated form of indexing for byte strings? -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk