
Hi All, Regarding the Fannkuch Shootout Entry: If we are willing to specialize flop in the way shown on the wiki, another 8% can be gained by similarly specializing rotate: rotate 2 (x1:x2:xs) = x2:x1:xs rotate 3 (x1:x2:x3:xs) = x2:x3:x1:xs rotate 4 (x1:x2:x3:x4:xs) = x2:x3:x4:x1:xs rotate 5 (x1:x2:x3:x4:x5:xs) = x2:x3:x4:x5:x1:xs rotate 6 (x1:x2:x3:x4:x5:x:6:xs) = x2:x3:x4:x5:x:6:x1:xs rotate 7 (x1:x2:x3:x4:x5:x:6:x7:xs) = x2:x3:x4:x5:x:6:x7:x1:xs rotate 8 (x1:x2:x3:x4:x5:x:6:x7:x8:xs) = x2:x3:x4:x5:x:6:x7:x8:x1:xs rotate 9 (x1:x2:x3:x4:x5:x:6:x7:x8:x9:xs) = x2:x3:x4:x5:x: 6:x7:x8:x9:x1:xs rotate 10 (x1:x2:x3:x4:x5:x:6:x7:x8:x9:x10:xs) = x2:x3:x4:x5:x: 6:x7:x8:x9:x10:x1:xs rotate n (x:xs) = rot' n xs where rot' 1 xs = x:xs rot' n (x:xs) = x:rot' (n-1) xs Cheers, David -------------------------------- David F. Place mailto:d@vidplace.com

d:
Regarding the Fannkuch Shootout Entry:
If we are willing to specialize flop in the way shown on the wiki, another 8% can be gained by similarly specializing rotate:
rotate 2 (x1:x2:xs) = x2:x1:xs rotate 3 (x1:x2:x3:xs) = x2:x3:x1:xs ...
Cheers, I've updated the proposed entry on the wiki. It now runs 40% faster than Bertram's original entry, and its within 25% of an imperative version I wrote yesterday translating from the currently fastest C version. Note that the imperative version is unoptimised so far though, so there could be room to move here, and it may be worth while translating some of the other C entries, to submit a fast, alongside an elegant entry. Entries that may currently be worth submitting: takfp - http://www.haskell.org/hawiki/TakfpEntry pidigits (currently 2nd!) - http://www.haskell.org/hawiki/PidigitsEntry mandelbrot - http://www.haskell.org/hawiki/MandelbrotEntry harmonic - http://www.haskell.org/hawiki/HarmonicEntry fannkuch (pure and impure) - http://www.haskell.org/hawiki/FannkuchEntry Chris, would you like to submit these? -- Don

On 09.01 12:56, Donald Bruce Stewart wrote:
Entries that may currently be worth submitting: takfp - http://www.haskell.org/hawiki/TakfpEntry
Committed.
pidigits (currently 2nd!) - http://www.haskell.org/hawiki/PidigitsEntry
Committed.
mandelbrot - http://www.haskell.org/hawiki/MandelbrotEntry
Committed.
harmonic - http://www.haskell.org/hawiki/HarmonicEntry
Already present in the CVS.
fannkuch (pure and impure) - http://www.haskell.org/hawiki/FannkuchEntry
I think these could do with some work. Optimizing the impure one at least. I took the liberty of submitting some of these. Please keep in future the comment lines in the entries, because Shootout wants the names of the contributers. - Einar Karttunen

ekarttun:
On 09.01 12:56, Donald Bruce Stewart wrote:
Entries that may currently be worth submitting: takfp - http://www.haskell.org/hawiki/TakfpEntry
Committed.
pidigits (currently 2nd!) - http://www.haskell.org/hawiki/PidigitsEntry
Committed.
mandelbrot - http://www.haskell.org/hawiki/MandelbrotEntry
Committed.
harmonic - http://www.haskell.org/hawiki/HarmonicEntry
Already present in the CVS.
fannkuch (pure and impure) - http://www.haskell.org/hawiki/FannkuchEntry
I think these could do with some work. Optimizing the impure one at least.
Great. Thanks. I've just posted a newer impure fannkuch on the wiki that takes another 35% off. It's down to 1.4s for n=10 on my (slowish) laptop :) Cheers, Don

Donald Bruce Stewart wrote:
Entries that may currently be worth submitting: takfp - http://www.haskell.org/hawiki/TakfpEntry pidigits (currently 2nd!) - http://www.haskell.org/hawiki/PidigitsEntry mandelbrot - http://www.haskell.org/hawiki/MandelbrotEntry harmonic - http://www.haskell.org/hawiki/HarmonicEntry fannkuch (pure and impure) - http://www.haskell.org/hawiki/FannkuchEntry
Chris, would you like to submit these?
Well, I can't promise to; I have less time this week. I have two strong suggestions: * whoever does submit them should "diff" the output with a previously accepted version. * ensure that the file comments and submission description explain the exact command line to compile and the exact command line to run. And two weak suggestion: * For speed comparisons, use an x86 architecture, ghc-6.4.1. (I made sum-file faster for G4 but not faster in general). * The -funbox-strict-fields can make a difference without needing explicitly unboxed Int# everywhere. And one observation: The wiki has been linking to the debian version http://shootout.alioth.debian.org/debian/ but there is also nearly the same set of programs on a gentoo version http://shootout.alioth.debian.org/gp4/ So statements about ranking are often only good for one version. Cheers, Chris

--- Chris Kuklewicz
I have two strong suggestions: * whoever does submit them should "diff" the output with a previously accepted version. -snip-
Simply diff program output with the example output file (there's now an output file link in each problem description). Of course, that won't guarantee the program is correct but it'll catch simple formating problems. There's a simple formating problem with the regex-dna program http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=ghc&id=2#log __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Donald Bruce Stewart wrote:
d:
Regarding the Fannkuch Shootout Entry:
If we are willing to specialize flop in the way shown on the wiki, another 8% can be gained by similarly specializing rotate:
rotate 2 (x1:x2:xs) = x2:x1:xs rotate 3 (x1:x2:x3:xs) = x2:x3:x1:xs ...
Cheers, I've updated the proposed entry on the wiki. It now runs 40% faster than Bertram's original entry, and its within 25% of an imperative version I wrote yesterday translating from the currently fastest C version.
The flop machinery can still be made faster, stealing an idea from the icc entry (namely, treating the first entry separately):
cut here >>> flop :: Int -> [Int] -> (Int, [Int]) flop 2 (x2:xs) = (x2, 2:xs) flop 3 (x2:x3:xs) = (x3, x2:3:xs) flop 4 (x2:x3:x4:xs) = (x4, x3:x2:4:xs) flop 5 (x2:x3:x4:x5:xs) = (x5, x4:x3:x2:5:xs) flop 6 (x2:x3:x4:x5:x6:xs) = (x6, x5:x4:x3:x2:6:xs) flop 7 (x2:x3:x4:x5:x6:x7:xs) = (x7, x6:x5:x4:x3:x2:7:xs) flop 8 (x2:x3:x4:x5:x6:x7:x8:xs) = (x8, x7:x6:x5:x4:x3:x2:8:xs) flop 9 (x2:x3:x4:x5:x6:x7:x8:x9:xs) = (x9, x8:x7:x6:x5:x4:x3:x2:9:xs) flop 10 (x2:x3:x4:x5:x6:x7:x8:x9:x10:xs) = (x10, x9:x8:x7:x6:x5:x4:x3:x2:10:xs) flop n xs = rs where (rs, ys) = fl n xs ys fl 2 (x:xs) ys = ((x, ys), n:xs) fl n (x:xs) ys = fl (n-1) xs (x:ys)
steps :: Int -> [Int] -> Int steps n (a:as) = steps' n (a,as) steps' n (1,_) = n steps' n (t,ts) = steps' (n+1) (flop t ts) <<< This cuts the running time down from 3.32s to 2.52s on my machine, for n=10. I've tried generating permutations as pairs in that form but that hurt performance rather than help. with kind regards, Bertram

bertram.felgenhauer:
The flop machinery can still be made faster, stealing an idea from the icc entry (namely, treating the first entry separately):
Great. This pushes the pure version up a notch. I've updated the wiki, showing how the code has progressed: Author Time in seconds (N=10) Original entry: > 2 minutes Sebastian Sylvan: 15.4 Kimberley Burchett: 7.2 Cale Gibbard: 5.8 Bertram Felgenhauer: 4.7 Original impure 2.1 Fastest pure 1.8 Fastest impure 1.4 And comparison with the C version: C gcc 1.35 C gcc -O2 0.5 -- Don
participants (6)
-
Bertram Felgenhauer
-
Chris Kuklewicz
-
David F. Place
-
dons@cse.unsw.edu.au
-
Einar Karttunen
-
Isaac Gouy