
Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie:
From: "Branimir Maksimovic"
To: daniel.is.fischer@web.de CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:55:02 +0000
From: Daniel Fischer
To: "Branimir Maksimovic" CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 20:40:06 +0100 Hi Bane,
nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?).
Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore algorithm yet, though. But I think it would be difficult to implement it in Haskell efficiently as it searches backwards and jumps around, and we want memory savings. Though, I even didn't tried yet, but it is certainly very interesting.
Forget what I've said. Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes forward, but when it finds last character in pattern, than starts to search backwards. This can be implemented easilly as Haskell lists naturaly reverse order when putting from one list to other. Heh, never say never :) As I see from documents Boyer-Moore has best performance on average and should be better than KMP.
Greetings,Bane.
Well, I also thought that all the jumping around in Boyer-Moore wasn't too good (after each shift we must bite off a chunk from the remaining input, pushing that onto the stack, which costs something). But I gave it a try today and here's what I came up with: import Data.List (tails) import Data.Map (Map) import qualified Data.Map as Map import Data.Array.Unboxed searchRep :: String -> String -> String -> String searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str where len = length src len1 = len-1 pat :: UArray Int Char pat = listArray (0,len1) src ch = pat!len1 badChar :: Map Char Int badChar = Map.fromList $ zip src [0 .. ] getBc c = case Map.lookup c badChar of Just n -> n Nothing -> -1 suffs :: UArray Int Int suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs where crs = reverse src pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys pr n _ _ = n bmGs0 :: UArray Int Int bmGs0 = array (0,len1) [(j,k) | (k,k') <- zip (tail $! help) help, j <- [k' .. k-1]] help = [k | k <- [0 .. len], k == len || suffs!k == len-k] bmGs :: UArray Int Int bmGs = bmGs0 // [(len1-suffs!k,k) | k <- [len1,len-2 .. 1]] run by "" = reverse by run by (c:cs) | c == ch = process (c:by) cs | otherwise = run (c:by) cs roll n xs ys | n <= 0 = (xs, ys) roll n xs (y:ys) = roll (n-1) (y:xs) ys roll _ xs "" = (xs, "") walk n "" = (n,"") walk n st@(c:cs) | n < 0 = (n,st) | c == pat!n = walk (n-1) cs | otherwise = (n,st) process con left | i < 0 = reverse pass ++ rp ++ run "" left | otherwise = {- bye ++ -} run ncon nleft where (i,pass) = walk len1 con d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass)) -- bye = reverse $! drop (len-d) con (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left it's not as fast as KMP for the tests, but not too bad. Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before a match (if any), we'd be better off relieving our memory with 'bye', I think. Any improvements are welcome, certainly some of you can do much better. Cheers, Daniel P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is somewhat fussy.