
Donald Bruce Stewart wrote:
d:
Regarding the Fannkuch Shootout Entry:
If we are willing to specialize flop in the way shown on the wiki, another 8% can be gained by similarly specializing rotate:
rotate 2 (x1:x2:xs) = x2:x1:xs rotate 3 (x1:x2:x3:xs) = x2:x3:x1:xs ...
Cheers, I've updated the proposed entry on the wiki. It now runs 40% faster than Bertram's original entry, and its within 25% of an imperative version I wrote yesterday translating from the currently fastest C version.
The flop machinery can still be made faster, stealing an idea from the icc entry (namely, treating the first entry separately):
cut here >>> flop :: Int -> [Int] -> (Int, [Int]) flop 2 (x2:xs) = (x2, 2:xs) flop 3 (x2:x3:xs) = (x3, x2:3:xs) flop 4 (x2:x3:x4:xs) = (x4, x3:x2:4:xs) flop 5 (x2:x3:x4:x5:xs) = (x5, x4:x3:x2:5:xs) flop 6 (x2:x3:x4:x5:x6:xs) = (x6, x5:x4:x3:x2:6:xs) flop 7 (x2:x3:x4:x5:x6:x7:xs) = (x7, x6:x5:x4:x3:x2:7:xs) flop 8 (x2:x3:x4:x5:x6:x7:x8:xs) = (x8, x7:x6:x5:x4:x3:x2:8:xs) flop 9 (x2:x3:x4:x5:x6:x7:x8:x9:xs) = (x9, x8:x7:x6:x5:x4:x3:x2:9:xs) flop 10 (x2:x3:x4:x5:x6:x7:x8:x9:x10:xs) = (x10, x9:x8:x7:x6:x5:x4:x3:x2:10:xs) flop n xs = rs where (rs, ys) = fl n xs ys fl 2 (x:xs) ys = ((x, ys), n:xs) fl n (x:xs) ys = fl (n-1) xs (x:ys)
steps :: Int -> [Int] -> Int steps n (a:as) = steps' n (a,as) steps' n (1,_) = n steps' n (t,ts) = steps' (n+1) (flop t ts) <<< This cuts the running time down from 3.32s to 2.52s on my machine, for n=10. I've tried generating permutations as pairs in that form but that hurt performance rather than help. with kind regards, Bertram