
Am Freitag, 16. Dezember 2005 03:36 schrieben Sie:
From: Daniel Fischer
Any improvements are welcome, certainly some of you can do much better.
It is fast on my machine except that you are using Map to lookup for badChar which is O(log n). I;ve placed this instead: badChar :: UArray Int Int badChar = array (0,255) ([(i,-1) | i <- [0..255]] ++ proc src 0) proc [] _ = [] proc (s:st) i = (ord s,i):proc st (i+1) getBc c = badChar ! ord c
which gaved it significant boost, O(1) lookup.
Yes, but Char has 1114112 values, and I'm not sure whether such a large array would still be better, especially since, presumably, the Map will usually not be deeper than five layers, say. But if we restrict ourselves to extended ASCII Strings, an array certainly is better. And maybe, instead of using two arrays, bmGs0 and bmGs, a mutable array (those are DiffArrays, I think -- I'll check that out) would also improve it.
Now it's faster then brute force method but 10% slower then KMP with my test. I've also performed tests on dual Xeon linux box and results are proportionally the same as on my intel windows box. KMP wins again 10% better then BM and 20-30% better then straightforward search, which means that KMP is well suited for non indexed strings.
Cheers, Daniel
P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is somewhat fussy.
Yes, BM is for indexed structures.
Greetings, Bane.
Cheers, Daniel