
From: Daniel Fischer
To: "Branimir Maksimovic" CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Tue, 13 Dec 2005 11:23:29 +0100 Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
From: Daniel Fischer
To: "Branimir Maksimovic"
CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 16:15:46 +0100 Earlier today:
Sorry, but Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba" "abababaaba"
I haven't analyzed the algorithm, so I don't know why exactly this
fails.
I'll take a look sometime soon.
I found the problem (one at least). Say the pattern to be replaced begins with 'a' and we have a sufficiently long match with the pattern starting at the first 'a' in the String. Upon encountering the second 'a', while the first pattern still matches, you start pushing onto the rollback-stack. But that isn't inspected anymore, so if the actual occurence of the pattern starts at the third (or fourth, n-th) occurence of 'a' and that is already pushed onto the rollback, you miss it.
I've corrected this with adjusting rollback position. if rollBack is null then search for rollback starts at second character if not starts at same as searhed character because I skip what was searched. That's all. Though I'm not so sure now when I read this.
Still not working:
*New> searchReplace "abababc" "#" "ababababababc" "ababababababc" *New> searchReplace1 "abababc" "#" "ababababababc" "ababababababc"
Yes, perhaps you've missed another post of mine. I've noticed that problem when pattern repeats more then 2 times and gave up because now whatever I do, your version is always fastest.
So the question is, can we find a cheap test to decide whether to use KMP or Bulat's version?
Just interleave string with search hits with one with no seacrh (that means partial too) hits, and your version will gain in speed. More partial matches and full search matches Bulat's version will gain in speed. Longer search strings, your version will have gains.
In real world situation your KMP will always be fastest on average. I like that we are not using C arrays as then we have advantage of lazyness and save on memory usage. C++ program will be faster on shorter strings but on this large strings will loose due memory latency. and with your test, both programs are very fast.
Greetings, Bane.
On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20% faster than my KMP on your test -- btw, I unboxed the pat array, which gave a bit of extra speed, but not much.
I think that's because on your machine Bulat's version have better perfromance with CPU cache. I don;t know but now your version is 25% faster with my test on P4 hyperthreaded. your new version: $ time srchrep.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real 0m8.734s user 0m0.015s sys 0m0.000s Bulat's version: bmaxa@MAXA ~/tutorial $ time replace1.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real 0m11.734s user 0m0.015s sys 0m0.015s 3 secs difference now.
And apologies to Sebastian Sylvan, I also included an unboxed version of bord, built from the boxed version, and that sped things further up -- not much, again, but there it is.
On my machine you got another 10-15% of boost with unboxed arrays.
I wonder about this difference, -10% on one system and +20% on another system, ist that normal?
Different caching schemes on CPU's perhaps? different memory latencies? hyperthreading helps your version? more code and data, perhaps because of that it pays the price on your machine? Greetings, Bane. _________________________________________________________________ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/