Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)

6 Jul 2011

      On Tuesday 05 July 2011, 20:01:26, Tillmann Vogt wrote:
...
Hi,
For my font library I need A function that can handle ligatures. It can
be explained best with an example:
f  ["Th", "ff", "fi", "fl", "ffi"]  "The fluffiest bunny"
should be evaluated to
["Th", "e", " ", "fl", "u", "ffi", "e", "s", "t", " ", "b", "u", "n",
"n", "y" ]
I looked at Data.Text
http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.
html and
http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/
Data-ByteString-Search.html
but they don't have a function that can search several substrings in one
run.
Well, Data.ByteString[.Lazy].KarpRabin does provide simultaneous search for 
multiple substrings - it does not, however, provide direct splitting.

But in my tests, unless the number of substrings was large, multiple 
separate (Boyer-Moore) passes with manual merging of the offset lists were 
much faster [there's a possible space leak for lazy ByteStrings if any 
pattern does not appear in a long substring of the source, so that would be 
a point for Data.ByteString.Lazy.KarpRabin], and I didn't know of any real 
use-case, so I did not pursue it further and considered it just an 
interesting curiosity.

I suppose using a regex package as Johannes Waldmann suggested would be the 
easiest way (probably also faster).

If you submit a feature request, however, I would look into expanding the 
offered functionality (but I'll be on vacation soon, so performance would 
have to wait; I suppose something could be gained there).
...
I guess that searching a text again and again for every substring is not
very efficient and it can be done in one run.
Well, it is surprisingly efficient, compared to (my implementation of) the 
Karp-Rabin algorithm at least.
...
Although I may figure this out myself I think such a function could be
so common that someone has done it or can give me some tips.
Thanks
Cheers,
Daniel

Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)

Daniel Fischer