Re: [Haskell-cafe] Substring replacements

13 Dec 2005

...
From: Daniel Fischer <daniel.is.fischer@web.de>
To: "Branimir Maksimovic" <bmaxa@hotmail.com>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Tue, 13 Dec 2005 11:23:29 +0100
Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
...
From: Daniel Fischer <daniel.is.fischer@web.de>
...
To: "Branimir Maksimovic" <bmaxa@hotmail.com>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 16:15:46 +0100
Earlier today:
...
Sorry, but
Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
"abababaaba"
I haven't analyzed the algorithm, so I don't know why exactly this
fails.
...
I'll take a look sometime soon.
I found the problem (one at least).
Say the pattern to be replaced begins with 'a' and we have a 
sufficiently
long
match with the pattern starting at the first 'a' in the String. Upon
encountering the second 'a', while the first pattern still matches, you
start
pushing onto the rollback-stack. But that isn't inspected anymore, so 
if
the
actual occurence of the pattern starts at the third (or fourth, n-th)
occurence of 'a' and that is already pushed onto the rollback, you miss
it.
I've corrected this with adjusting rollback position. if rollBack is 
null
then
search for rollback starts at second character if not starts at same as
searhed
character because I skip what was searched. That's all.
Though I'm not so sure now when I read this.
Still not working:
*New> searchReplace "abababc" "#" "ababababababc"
"ababababababc"
*New> searchReplace1 "abababc" "#" "ababababababc"
"ababababababc"
Yes, perhaps you've missed another post of mine. I've noticed
that problem when pattern repeats more then 2 times and gave up
because now whatever I do, your version is always fastest.
...
...
...
So the question is, can we find a cheap test to decide whether to use 
KMP
or
Bulat's version?
Just interleave string with  search hits with one with no seacrh (that means 
partial too)
hits, and your version will gain in speed.
More partial matches and full search matches Bulat's version will gain in
speed.
Longer search strings, your version will have gains.
...
...
In real world situation your KMP will always be fastest on average.
I like that we are not using C arrays as then we have advantage
of lazyness and save on memory usage. C++ program will be faster
on shorter strings but on this large strings will loose due memory
latency. and with your test, both programs are very fast.
Greetings, Bane.
On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 
20%
faster than my KMP on your test -- btw, I unboxed the pat array, which gave 
a
bit of extra speed, but not much.
I think that's because on your machine Bulat's version have better 
perfromance
with CPU cache.
I don;t know but now your version is 25% faster with my test on P4
hyperthreaded.

your new version:
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done

real    0m8.734s
user    0m0.015s
sys     0m0.000s

Bulat's version:

bmaxa@MAXA ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done

real    0m11.734s
user    0m0.015s
sys     0m0.015s

3 secs difference now.
...
And apologies to Sebastian Sylvan, I also included an unboxed version of 
bord,
built from the boxed version, and that sped things further up -- not much,
again, but there it is.
On my machine you got another 10-15% of boost with unboxed arrays.
...
I wonder about this difference, -10% on one system and +20% on another 
system,
ist that normal?
Different caching schemes on CPU's perhaps? different memory latencies?
hyperthreading helps your version? more code and data, perhaps because
of that it pays the price on your machine?

Greetings, Bane.

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/