Haskell KMP(Knuth-Morris-Pratt) algorithm

Hi, I read about some KMP implementation in Haskell including: [1] Richard Bird. ``Pearls of Functional algorithm design'' [2] http://twan.home.fmf.nl/blog/haskell/Knuth-Morris-Pratt-in-Haskell.details [3] http://www.haskell.org/haskellwiki/Runtime_compilation [4] LazyString version [1] builds a infinite lazy state transfer trees, while [3] uses index to build overlap table. I created a version which isn't as efficient as in [1]. Just for fun: failure :: (Eq a)=> ([a], [a]) -> ([a], [a]) failure ([], ys) = ([], ys) failure (xs, ys) = fallback (init xs) (last xs:ys) where fallback as bs | as `isSuffixOf` xs = (as, bs) | otherwise = fallback (init as) (last as:bs) kmpSearch2 :: (Eq a) => [a] -> [a] ->[Int] kmpSearch2 ws txt = snd $ foldl f (([], ws), []) (zip txt [1..]) where f (p@(xs, (y:ys)), ns) (x, n) | x == y = if ys==[] then ((xs++[y], ys), ns++[n]) else ((xs++[y], ys), ns) | xs == [] = (p, ns) | otherwise = f (failure p, ns) (x, n) f (p, ns) e = f (failure p, ns) e The function failure just follows the idea that in case (xs, ys) fails matching some letter c in text, where xs++ys = pattern and c!= head ys, it means we must fallback to (xs', ys') so that xs' = longest { s: s is prefix of xs AND s is suffix of xs } The bad thing is that failure can't memorize what it has compute before, for example, as pattern = "ababc" and we fails at ("abab", "c"), then we call function failure to get the new one as ("ab", "abc"). After several matches, we fails again at ("abab", "c"), failure can't just return ("ab", "abc") what it has been compute already. It has too do the same work again. Function f inside kmpSearch2 is in fact a state-transfer function. If we try to use some data structure (for example tree) to memorize the results which failure function calculated, we can finally reach to the idea in [1]. -- LIU http://sites.google.com/site/algoxy/

Hi,
Here is Richard Bird's version for reference. I changed it a bit.
data State a = E | S a (State a) (State a)
matched (S (_, []) _ _) = True
matched _ = False
kmpSearch4 :: (Eq a) => [a] -> [a] -> [Int]
kmpSearch4 ws txt = snd $ foldl tr (root, []) (zip txt [1..]) where
root = build E ([], ws)
build fails (xs, []) = S (xs, []) fails E
build fails s@(xs, (y:ys)) = S s fails succs where
succs = build' (fst (tr (fails, []) (y, 0))) (xs++[y], ys)
tr (E, ns) _ = (root, ns)
tr ((S (xs, ys) fails succs), ns) (x, n)
| [x] `isPrefixOf` ys = if matched succs then (succs, ns++[n])
else (succs, ns)
| otherwise = tr (fails, ns) (x, n)
In the program, tr is the transfer function applied to the state tree.
And build function is used to build the automaton.
Best regards.
--
LIU
On Mar 3, 5:25 pm, "larry.liuxinyu"
Hi,
I read about some KMP implementation in Haskell including:
[1] Richard Bird. ``Pearls of Functional algorithm design'' [2]http://twan.home.fmf.nl/blog/haskell/Knuth-Morris-Pratt-in-Haskell.de... [3]http://www.haskell.org/haskellwiki/Runtime_compilation [4] LazyString version
[1] builds a infinite lazy state transfer trees, while [3] uses index to build overlap table.
I created a version which isn't as efficient as in [1]. Just for fun:
failure :: (Eq a)=> ([a], [a]) -> ([a], [a]) failure ([], ys) = ([], ys) failure (xs, ys) = fallback (init xs) (last xs:ys) where fallback as bs | as `isSuffixOf` xs = (as, bs) | otherwise = fallback (init as) (last as:bs)
kmpSearch2 :: (Eq a) => [a] -> [a] ->[Int] kmpSearch2 ws txt = snd $ foldl f (([], ws), []) (zip txt [1..]) where f (p@(xs, (y:ys)), ns) (x, n) | x == y = if ys==[] then ((xs++[y], ys), ns++[n]) else ((xs++[y], ys), ns) | xs == [] = (p, ns) | otherwise = f (failure p, ns) (x, n) f (p, ns) e = f (failure p, ns) e
The function failure just follows the idea that in case (xs, ys) fails matching some letter c in text, where xs++ys = pattern and c!= head ys, it means we must fallback to (xs', ys') so that xs' = longest { s: s is prefix of xs AND s is suffix of xs }
The bad thing is that failure can't memorize what it has compute before, for example, as pattern = "ababc" and we fails at ("abab", "c"), then we call function failure to get the new one as ("ab", "abc"). After several matches, we fails again at ("abab", "c"), failure can't just return ("ab", "abc") what it has been compute already. It has too do the same work again.
Function f inside kmpSearch2 is in fact a state-transfer function. If we try to use some data structure (for example tree) to memorize the results which failure function calculated, we can finally reach to the idea in [1].
-- LIUhttp://sites.google.com/site/algoxy/
_______________________________________________ Haskell-Cafe mailing list Haskell-C...@haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe
participants (1)
-
larry.liuxinyu