Am I doing it right?

Creighton Hogg

24 Sep 2008 24 Sep '08

6:14 p.m.

Hi Haskellers, So as another step in my education, I wanted to ask if the following code at least feels 'morally' correct as a find/replace function, replacing each occurrence of the sub-list before with after in the list. unfoldr felt like the most natural higher order function to capture the recursion, but I imagine there are still slicker ways to have done this because I am creating an intermediate list-of-lists. However, doing it this way seems at least moderately speedy given that I was able to read in, find/replace, and write out a 120 MB file in 30 seconds on a fairly dinky linux box. replace :: (Eq a) => [a] -> [a] -> [a] -> [a] replace before after = concat . unfoldr aux where aux [] = Nothing aux str | take l str == before = Just (after,drop l str) | otherwise = Just (take 1 str,drop 1 str) l = length before Thanks, Creighton

Show replies by date

Luke Palmer

24 Sep 24 Sep

7:16 p.m.

On Wed, Sep 24, 2008 at 1:14 PM, Creighton Hogg wrote:

...

Hi Haskellers, So as another step in my education, I wanted to ask if the following code at least feels 'morally' correct as a find/replace function, replacing each occurrence of the sub-list before with after in the list. unfoldr felt like the most natural higher order function to capture the recursion, but I imagine there are still slicker ways to have done this because I am creating an intermediate list-of-lists. However, doing it this way seems at least moderately speedy given that I was able to read in, find/replace, and write out a 120 MB file in 30 seconds on a fairly dinky linux box.

replace :: (Eq a) => [a] -> [a] -> [a] -> [a] replace before after = concat . unfoldr aux where aux [] = Nothing aux str | take l str == before = Just (after,drop l str) | otherwise = Just (take 1 str,drop 1 str) l = length before

Yeah, this code is pretty nice. concat is a foldr, so you have an elegant foldr . unfoldr at the top. IIRC foldr . unfoldr fuses, so you get efficiency that way. "take l str == before" is also known as "before `isPrefixOf` str", just to be slightly more idiomatic. This algorithm is quite inefficient, O(nm). There is probably a more efficient one that still only relies on Eq, I am not certain though.€€ Luke

Achim Schneider

10:39 p.m.

"Creighton Hogg" wrote:

...

So as another step in my education, I wanted to ask if the following code at least feels 'morally' correct as a find/replace function, replacing each occurrence of the sub-list before with after in the list.

Besides not using head, tail and isPrefixOf I would write it the same, except that I'd write it like this: -->8 import Data.ByteString.Lazy.Char8 as B import Prelude as P cut = let f _ s [] = [s] f n s (x:xs) = let (h,t) = B.splitAt (x - n) s in h:f x t xs in f 0 replace before@(b:_) after this = let before' = B.pack before after' = B.pack after f s | B.tail before' `B.isPrefixOf` B.tail s = B.append after' $ B.drop (B.length before') s | otherwise = s in B.concat $ P.map f $ cut this $ B.elemIndices b this main = B.interact $ replace "th" "s" -->8 As we all love benchmarks so dearly, here are the times: ./replace < /usr/share/dict/words > /dev/null 1.29s user 0.02s system 87% cpu 1.498 total ./replace < /usr/share/dict/words > /dev/null 0.25s user 0.02s system 84% cpu 0.313 total -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or broadcasting of this signature prohibited.

Achim Schneider

11 p.m.

Achim Schneider wrote:

...

"Creighton Hogg" wrote:

...
So as another step in my education, I wanted to ask if the following code at least feels 'morally' correct as a find/replace function, replacing each occurrence of the sub-list before with after in the list.

Besides not using head, tail and isPrefixOf I would write it the same, except that I'd write it like this:

Coming to think of it, I really don't like those temporary Int64's, memchr or not. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or broadcasting of this signature prohibited.

Daniel Fischer

11:53 p.m.

Am Donnerstag, 25. September 2008 00:39 schrieb Achim Schneider:

...

"Creighton Hogg" wrote:

...
So as another step in my education, I wanted to ask if the following code at least feels 'morally' correct as a find/replace function, replacing each occurrence of the sub-list before with after in the list.

Besides not using head, tail and isPrefixOf I would write it the same, except that I'd write it like this:

-->8

import Data.ByteString.Lazy.Char8 as B import Prelude as P

cut = let f _ s [] = [s] f n s (x:xs) = let (h,t) = B.splitAt (x - n) s in h:f x t xs in f 0

replace before@(b:_) after this = let before' = B.pack before after' = B.pack after f s | B.tail before' `B.isPrefixOf` B.tail s = B.append after' $ B.drop (B.length before') s

| otherwise = s

in B.concat $ P.map f $ cut this $ B.elemIndices b this

main = B.interact $ replace "th" "s"

-->8

As we all love benchmarks so dearly, here are the times:

./replace < /usr/share/dict/words > /dev/null 1.29s user 0.02s system 87% cpu 1.498 total

./replace < /usr/share/dict/words > /dev/null 0.25s user 0.02s system 84% cpu 0.313 total

If you love benchmarks, how about: dafis@linux:~/Documents/haskell/move> time ./replaceAS < /usr/share/dict/ger-eng.txt > /dev/null real 0m1.227s user 0m1.130s sys 0m0.100s dafis@linux:~/Documents/haskell/move> time ./replaceKMP < /usr/share/dict/ger-eng.txt > /dev/null real 0m0.179s user 0m0.130s sys 0m0.050s where the latter is --------------------------------------------- module Main (main) where import qualified Data.ByteString.Lazy.Char8 as B import qualified Data.ByteString.Search.KnuthMorrisPratt as KMP replace before after this = let pat = B.pack before rep = B.pack after lp = B.length pat ixs = KMP.matchLL pat this offs = zipWith (-) ixs (0:map (+ lp) ixs) wrk bs [] = [bs] wrk bs (o:os) = let (h,t) = B.splitAt o bs rm = B.drop lp t in h:rep:wrk rm os in B.concat $ wrk this offs main = B.interact $ replace "th" "s" ------------------------------------------------------- just to stay close to your code and because I didn't want to think about a really fast search&replace implementation. Note that longer patterns give a better advantage to the stringsearch package, searching for "this" (replacing with "that") gives dafis@linux:~/Documents/haskell/move> time ./replaceAS < /usr/share/dict/ger-eng.txt > /dev/null real 0m1.222s user 0m1.130s sys 0m0.080s dafis@linux:~/Documents/haskell/move> time ./replaceKMP < /usr/share/dict/ger-eng.txt > /dev/null real 0m0.124s user 0m0.080s sys 0m0.040s dafis@linux:~/Documents/haskell/move> time ./replaceBM < /usr/share/dict/ger-eng.txt > /dev/null real 0m0.093s user 0m0.060s sys 0m0.030s The fast searching function on ByteStrings has already been written for you :)

Achim Schneider

25 Sep 25 Sep

8:41 a.m.

Daniel Fischer wrote:

...

The fast searching function on ByteStrings has already been written for you :)

That's in ghc 6.8.3, which is not in gentoo but only in the haskell overlay, which means that all blame goes to the gentoo maintainers for being utterly out of date. The KMP import works like a charm. findSubstring is only defined for strict bytestrings... try running those benchmarks again, this time on data bigger than your ram. Not to mention that it's deprecated. The really interesting topic is hacking Parsec to use KMP search on "manyTill anyChar (try string match), or rather any recursive try involving combinators that can calculate the position for the next candidate match as a side effect. PS: Thank you for not pointing out that my original code crashes on B.tail B.empty in some cases, or even just that it can't replace overlapping matches at all. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or broadcasting of this signature prohibited.

6131

Age (days ago)

6132

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Achim Schneider
Creighton Hogg
Daniel Fischer
Luke Palmer