Re: substring search api

18 Sep 2007


      Bryan O'Sullivan  writes:
...
Duncan Coutts wrote:
...
So perhaps that's my straw-man proposal:
  * change BS.findSubstring to be :: BS -> BS -> (BS, BS)
     in the style of List.break
  * remove the current BS.findSubstrings
While List.break is useful, it has the equally useful
counterpart (dropWhile . not . (==)) that doesn't accumulate
the prefix of a match. For a long sequence, this has appeal.
Let's say you're reading ten gigabytes of data over the
network, so you have no control over the incoming chunk size
(as we don't provide a rechunking mechanism at present, so
this isn't a hypothetical issue).  A findSubstring that
accumulates the prefix could easily cons an fatally large
number of chunks.
I'm not saying that the signature you suggest shouldn't be
present, merely that it's not enough: it wants a counterpart
that accumulates either nothing or something safe like an
Int64 that counts the length of the prefix.
I'm not familiar with Bytestrings so I'm probably out of my
depth, but something that strikes me is that if you are
returned an index to a large object like this, to use it as
the offset it would have to be the offset from the beginning
of the large object, which would cause the large object to
be held in memory until the indexing/dropping expression is
evaluated. Or is there some more sophisticated form of
indexing for byte strings?

-- 
Jón Fairbairn                                 Jon.Fairbairn@cl.cam.ac.uk

Re: substring search api

Jon Fairbairn