
Hi, Bane and all, Am Dienstag, 13. Dezember 2005 14:22 schrieben Sie:
In real world situation your KMP will always be fastest on average. I like that we are not using C arrays as then we have advantage of lazyness and save on memory usage. C++ program will be faster on shorter strings but on this large strings will loose due memory latency. and with your test, both programs are very fast.
Yesterday, I did the unspeakable -- I wrote a C-version. Smashes Haskell-performance for short enough Strings (factor 10 for my test, factor 2.2 for Bane's), but once it starts swapping, we catch up, and for really large Strings I dare say we'd win far out. I also managed to get my KMP still faster, using take and drop instead of splitAt helps a lot (Bane, the use of 'break' in my transcript of yours was what slowed it down, I reintroduced searchr''' and both are equal). I'm not quite sure, whether that indeed helps, but I also chose to use listArray for the boxed array of borders. Now it's searchReplace :: String -> String -> String -> String searchReplace "" _ str = str searchReplace src@(c:cs) dst str = process 0 str "" where len = length src pat :: UArray Int Char pat = listArray (0,len-1) src bord = A.listArray (0,len) $ (-1):0: [getBord (pat!i) i + 1 | i <- [1 .. len-1]] getBord s n | m < 0 = m | s == pat!m = m | otherwise = getBord s m where m = bord A.! n bor :: UArray Int Int bor = listArray (0,len) $ A.elems bord getBor s n | m < 0 || s == pat!m = m | otherwise = getBor s m where m = bor!n process n str _ | n >= len = dst ++ process 0 str "" process _ "" mat = reverse mat process 0 (s:st) _ | s == c = process 1 st [s] | otherwise = s:process 0 st "" process n str@(s:st) mat | s == pat!n = process (n+1) st (s:mat) | otherwise = case getBor s n of -1 -> reverse mat ++ process 0 str "" 0 -> reverse mat ++ process 1 st [s] j -> reverse (drop j mat) ++ process (j+1) st (s:take j mat) gives a speedup of roughly 10% on my box versus yesterday's version.
Greetings, Bane.
On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20% faster than my KMP on your test -- btw, I unboxed the pat array, which gave a bit of extra speed, but not much.
I think that's because on your machine Bulat's version have better perfromance with CPU cache. I don;t know but now your version is 25% faster with my test on P4 hyperthreaded.
Errrr, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with computers.
your new version: $ time srchrep.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done
real 0m8.734s user 0m0.015s sys 0m0.000s
Bulat's version:
bmaxa@MAXA ~/tutorial $ time replace1.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done
real 0m11.734s user 0m0.015s sys 0m0.015s
3 secs difference now.
And apologies to Sebastian Sylvan, I also included an unboxed version of bord, built from the boxed version, and that sped things further up -- not much, again, but there it is.
On my machine you got another 10-15% of boost with unboxed arrays.
I wonder about this difference, -10% on one system and +20% on another system, ist that normal?
Different caching schemes on CPU's perhaps? different memory latencies? hyperthreading helps your version? more code and data, perhaps because of that it pays the price on your machine?
Greetings, Bane.
Well, whatever. Upto now, on my box, Bulat's is still the fastest for your test -- though I've narrowed the gap quite a bit. Cheers, Daniel