
From: ajb@spamcop.net To: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:25:19 -0500
G'day all.
Quoting Branimir Maksimovic
: After seeing that your program is fastest (I've also tried one from http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not that good in converting to search replace?)
You probably did it right, but you could post your version to the list if you want me to take a look.
Oh, here it is but just don;t laugh :) I've hacked with unsafePerformIO as I din't know how to remove IO from match any other way. searchReplaceKMP :: String->String->String -> String searchReplaceKMP sr rp s | not (null remaining) = found++rp ++ searchReplaceKMP sr rp remaining | otherwise = found where (found,remaining) = unsafePerformIO $ matchKMP sr s matchKMP :: (Monad m, Eq a) => [a] -> ([a] -> m ([a],[a])) matchKMP [] = error "Can't match empty list" matchKMP xs = matchfunc [] where matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs)) dofail = \ps xs -> case xs of [] -> fail "can't match" (y:ys) -> matchfunc (y:ps) ys type PartialMatchFunc m a = [a] -> [a] -> m ([a], [a]) makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc m a] -> [(a, Int)] -> PartialMatchFunc m a makeMatchFunc prev [] = \ps xs -> return (reverse (drop ((length prev)-1) ps), xs) makeMatchFunc prev ((x,failstate):ms) = thisf where mf = makeMatchFunc (thisf:prev) ms failcont = prev !! (length prev - failstate - 1) thisf = \ps xs -> case xs of [] -> fail "can't match" (y:ys) -> if (x == y) then mf (y:ps) ys else failcont ps xs overlap :: (Eq a) => [a] -> [Int] overlap str = overlap' [0] str where overlap' prev [] = reverse prev overlap' prev (x:xs) = let get_o o | o <= 1 || str !! (o-2) == x = o | otherwise = get_o (1 + prev !! (length prev - o + 1)) in overlap' (get_o (head prev + 1):prev) xs -------------------------------------------------------------------------------------- These are timings (it's performance is about the same as Rabin-Karp): $ time searchr.exe Working:seasearch replace able seaseasearch baker seasearch charlie searchr.exe: user error (can't match) real 0m22.187s user 0m0.015s sys 0m0.015s bmaxa@MAXA ~/tutorial $ ghc -fglasgow-exts -O2 searchr.hs --make -o searchr.exe Chasing modules from: searchr.hs Compiling Main ( searchr.hs, searchr.o ) Linking ... bmaxa@MAXA ~/tutorial $ time searchr.exe Working very long True Done real 0m8.110s user 0m0.031s sys 0m0.016s
When I wrote the RunTimeCompilation code, it wasn't intended to be a shining example of efficiency, merely an illustration. Remember that it's doing TWO things: compiling the pattern to code, and then performing the search. The compilation phase is likely to be much slower than the search, so the speedup (if any!) would only be realised the SECOND time that you searched a string using the same pattern. (Assuming you re-used the compiled match code, of course!)
Oh, that explaines it. Actually this has to be converted to searchReplace in order to be fast, but I don;t know how (yet) as your program is pretty complicated to my humble Haskell skills. I think that your technique can be usefull with Aho-Corasick algorithm as it first constructs finite automaton from tree, then performs search. So, I'll guess I'll try first Boyer-Moore, then Aho-Corasick, eventually run time compilation, but this is too advanced for me for now. Greetings, Bane. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/