Progress on shootout entries

Hello, Where there were no entries to the http://shootout.alioth.debian.org/benchmark.php?test=chameneos&lang=all benchmark, there are now two. The one by Josh Goldfoot is already posted, the one Einar Karttunen and I optimized has been submitted and will run faster/smaller. Our code is at http://haskell.org/hawiki/ChameneosEntry Now for improving the fasta benchmark, http://shootout.alioth.debian.org/benchmark.php?test=fasta&lang=all , which currently has a space leak in the Haskell entry. A non-leaking version which has been optimized to run 3.5 times faster is now up at http://haskell.org/hawiki/FastaEntra (ooops..my spelling mistake). It could still be made to run about 3 times faster, if the other languages are any guide. Anyone want to help polish this one? Also, two other existing entries have space leaks, as can be seen at http://shootout.alioth.debian.org/benchmark.php?test=all&lang=ghc&lang2=ghc And finially, the haskel entry for http://shootout.alioth.debian.org/benchmark.php?test=fannkuch&lang=all is currently the *slowest* entry out of 28 languages. It is 813x slower than the c-code, 500x slower than OCaml. Should be easy to make it faster... Cheers, Chris Kuklewicz

On 1/3/06, Chris Kuklewicz
Hello,
Where there were no entries to the http://shootout.alioth.debian.org/benchmark.php?test=chameneos&lang=all benchmark, there are now two. The one by Josh Goldfoot is already posted, the one Einar Karttunen and I optimized has been submitted and will run faster/smaller. Our code is at http://haskell.org/hawiki/ChameneosEntry
Now for improving the fasta benchmark, http://shootout.alioth.debian.org/benchmark.php?test=fasta&lang=all , which currently has a space leak in the Haskell entry.
A non-leaking version which has been optimized to run 3.5 times faster is now up at http://haskell.org/hawiki/FastaEntra (ooops..my spelling mistake).
It could still be made to run about 3 times faster, if the other languages are any guide. Anyone want to help polish this one?
Also, two other existing entries have space leaks, as can be seen at http://shootout.alioth.debian.org/benchmark.php?test=all&lang=ghc&lang2=ghc
And finially, the haskel entry for http://shootout.alioth.debian.org/benchmark.php?test=fannkuch&lang=all is currently the *slowest* entry out of 28 languages. It is 813x slower than the c-code, 500x slower than OCaml. Should be easy to make it faster...
While the implementation is far from "nice" it still finishes with N=9 (which, AFAICT, is what the benchmark is run with) in a fraction of a second on my machine (and not anywhere near 51s as in the benchmark)... I have a 2.6 Ghz P4... I was going to rewrite it using mutable STArrays for a pure version that's still fast but i sorta feel like I lost the motivation now that it turns out the existing implementation, though ugly, performs somewhat okay... /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

On 1/3/06, Sebastian Sylvan
On 1/3/06, Chris Kuklewicz
wrote: Hello,
Where there were no entries to the http://shootout.alioth.debian.org/benchmark.php?test=chameneos&lang=all benchmark, there are now two. The one by Josh Goldfoot is already posted, the one Einar Karttunen and I optimized has been submitted and will run faster/smaller. Our code is at http://haskell.org/hawiki/ChameneosEntry
Now for improving the fasta benchmark, http://shootout.alioth.debian.org/benchmark.php?test=fasta&lang=all , which currently has a space leak in the Haskell entry.
A non-leaking version which has been optimized to run 3.5 times faster is now up at http://haskell.org/hawiki/FastaEntra (ooops..my spelling mistake).
It could still be made to run about 3 times faster, if the other languages are any guide. Anyone want to help polish this one?
Also, two other existing entries have space leaks, as can be seen at http://shootout.alioth.debian.org/benchmark.php?test=all&lang=ghc&lang2=ghc
And finially, the haskel entry for http://shootout.alioth.debian.org/benchmark.php?test=fannkuch&lang=all is currently the *slowest* entry out of 28 languages. It is 813x slower than the c-code, 500x slower than OCaml. Should be easy to make it faster...
While the implementation is far from "nice" it still finishes with N=9 (which, AFAICT, is what the benchmark is run with) in a fraction of a second on my machine (and not anywhere near 51s as in the benchmark)... I have a 2.6 Ghz P4...
I was going to rewrite it using mutable STArrays for a pure version that's still fast but i sorta feel like I lost the motivation now that it turns out the existing implementation, though ugly, performs somewhat okay...
Hmm.. This may be due to laziness. Since it's only supposed to print out the first 30 lines it won't compute the full n! values... /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Discussing the fannkuch entry Sebastian Sylvan wrote:
On 1/3/06, Sebastian Sylvan
wrote: On 1/3/06, Chris Kuklewicz
wrote: Hello,
And finially, the haskel entry for http://shootout.alioth.debian.org/benchmark.php?test=fannkuch&lang=all is currently the *slowest* entry out of 28 languages. It is 813x slower than the c-code, 500x slower than OCaml. Should be easy to make it faster...
While the implementation is far from "nice" it still finishes with N=9 (which, AFAICT, is what the benchmark is run with) in a fraction of a second on my machine (and not anywhere near 51s as in the benchmark)... I have a 2.6 Ghz P4...
I was going to rewrite it using mutable STArrays for a pure version that's still fast but i sorta feel like I lost the motivation now that it turns out the existing implementation, though ugly, performs somewhat okay...
Hmm.. This may be due to laziness. Since it's only supposed to print out the first 30 lines it won't compute the full n! values...
/S
If you look at the code, then you may see that
findmax :: Int8 -> [[Int8]] -> Int8 findmax soFar [] = soFar findmax soFar (x:xs) = max (flop 0 x) (findmax soFar xs)
is broken. The soFar parameter (which is originally 0) does absolutely nothing. I think this would be more appropriate: findmax' xs = foldl1' max $ map (flop 0) xs They use (!!) on lists instead of, as you said, STArrays. For truly optimal performance mallocArray of Word8 would actually model the c code much better than the lists. They have [a] types and fromIntegral when it is all Int8, as far as I can see. And for sanity's sake, I wish one of the entries would have documentated a clear way to understand the permutation generator. The PHP and Lua versions are almost legible. -- Chris

And for sanity's sake, I wish one of the entries would have documentated a clear way to understand the permutation generator. The PHP and Lua versions are almost legible.
Here's a neat Haskell version: -- rotate initial n elements of the list left by one place rotate n (x:xs) = rot' n xs where rot' 1 xs = x:xs rot' n (x:xs) = x:rot' (n-1) xs permutations l = foldr perm' [l] [2..length l] where perm' n l = l >>= take n . iterate (rotate n) Combined with Jan-Willem Maessen's ideas (i.e. the single-pass flop) this runs about 85 times faster than the current shootout entry. Bertram

Could you post your code to this mailing list or to the wiki at http://haskell.org/hawiki/FannkuchEntry ? Bertram Felgenhauer wrote:
And for sanity's sake, I wish one of the entries would have documentated a clear way to understand the permutation generator. The PHP and Lua versions are almost legible.
Here's a neat Haskell version:
-- rotate initial n elements of the list left by one place rotate n (x:xs) = rot' n xs where rot' 1 xs = x:xs rot' n (x:xs) = x:rot' (n-1) xs
permutations l = foldr perm' [l] [2..length l] where perm' n l = l >>= take n . iterate (rotate n)
Combined with Jan-Willem Maessen's ideas (i.e. the single-pass flop) this runs about 85 times faster than the current shootout entry.
Bertram _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Chris Kuklewicz wrote:
Could you post your code to this mailing list or to the wiki at http://haskell.org/hawiki/FannkuchEntry ?
I added it to the wiki. (I added it at the top - if anyone feels put down by this, I apologize. Feel free to move it.) Enjoy, Bertram

On 03/01/06, Sebastian Sylvan
On 1/3/06, Sebastian Sylvan
wrote: On 1/3/06, Chris Kuklewicz
wrote: Hello,
Where there were no entries to the http://shootout.alioth.debian.org/benchmark.php?test=chameneos&lang=all benchmark, there are now two. The one by Josh Goldfoot is already posted, the one Einar Karttunen and I optimized has been submitted and will run faster/smaller. Our code is at http://haskell.org/hawiki/ChameneosEntry
Now for improving the fasta benchmark, http://shootout.alioth.debian.org/benchmark.php?test=fasta&lang=all , which currently has a space leak in the Haskell entry.
A non-leaking version which has been optimized to run 3.5 times faster is now up at http://haskell.org/hawiki/FastaEntra (ooops..my spelling mistake).
It could still be made to run about 3 times faster, if the other languages are any guide. Anyone want to help polish this one?
Also, two other existing entries have space leaks, as can be seen at http://shootout.alioth.debian.org/benchmark.php?test=all&lang=ghc&lang2=ghc
And finially, the haskel entry for http://shootout.alioth.debian.org/benchmark.php?test=fannkuch&lang=all is currently the *slowest* entry out of 28 languages. It is 813x slower than the c-code, 500x slower than OCaml. Should be easy to make it faster...
While the implementation is far from "nice" it still finishes with N=9 (which, AFAICT, is what the benchmark is run with) in a fraction of a second on my machine (and not anywhere near 51s as in the benchmark)... I have a 2.6 Ghz P4...
I was going to rewrite it using mutable STArrays for a pure version that's still fast but i sorta feel like I lost the motivation now that it turns out the existing implementation, though ugly, performs somewhat okay...
Hmm.. This may be due to laziness. Since it's only supposed to print out the first 30 lines it won't compute the full n! values...
/S
You might not have been waiting for the final result. The first 30 perms print quickly, but it takes longer to get the solution to the problem. I managed to do better with the following program which gets the following time report on my machine real 0m8.175s user 0m7.742s sys 0m0.186s as opposed to real 0m23.232s user 0m21.115s sys 0m0.077s for the shootout code. I didn't try too hard to optimise it heavily, but it does use a tree-based permutation generator I stole from some code which was in an n-queens solution I had laying around (pretty sure it's not mine), and an obvious memoisation hack when handling the flips. - Cale

On 03/01/06, Cale Gibbard
I managed to do better with the following program which gets the following time report on my machine real 0m8.175s user 0m7.742s sys 0m0.186s as opposed to real 0m23.232s user 0m21.115s sys 0m0.077s for the shootout code.
I didn't try too hard to optimise it heavily, but it does use a tree-based permutation generator I stole from some code which was in an n-queens solution I had laying around (pretty sure it's not mine), and an obvious memoisation hack when handling the flips.
Hmm, do the permutations have to be in their specific order? This permutation generator seems to go through them in a somewhat different order. It seems irrelevant to the problem, but since they want the permutations as part of the output, it's a good question. :) In that case, I wonder if it would be best to use some other generator to print the first 30, then switch to some faster generator for the actual computation. :) - Cale

Hello, Here is a short (16 lines) and readable Haskell'98 solution. I haven't optimized it or tested it much. When compiled with ghc(6.4.1) -O2, it takes about 10s to compute the answer for 9, on my P3 366MHz machine. It seems to use about 16K of memory. -Iavor import System(getArgs) flop xs@(x:_) = reverse (take x xs) ++ drop x xs flops xs = takeWhile ((1 /=) . head) (iterate flop xs) perms xs = foldr (concatMap . ins) [[]] xs ins x [] = [[x]] ins x (y:ys) = (x:y:ys) : map (y:) (ins x ys) pfannkuchen x = maximum (map (length . flops) (perms [1..x])) main = do a:_ <- getArgs let n = read a :: Int putStrLn (unlines (map show (take 30 (perms [1..n])))) print (pfannkuchen n)

On 1/3/06, Chris Kuklewicz
Hello,
Where there were no entries to the http://shootout.alioth.debian.org/benchmark.php?test=chameneos&lang=all benchmark, there are now two. The one by Josh Goldfoot is already posted, the one Einar Karttunen and I optimized has been submitted and will run faster/smaller. Our code is at http://haskell.org/hawiki/ChameneosEntry
Now for improving the fasta benchmark, http://shootout.alioth.debian.org/benchmark.php?test=fasta&lang=all , which currently has a space leak in the Haskell entry.
A non-leaking version which has been optimized to run 3.5 times faster is now up at http://haskell.org/hawiki/FastaEntra (ooops..my spelling mistake).
It could still be made to run about 3 times faster, if the other languages are any guide. Anyone want to help polish this one?
Also, two other existing entries have space leaks, as can be seen at http://shootout.alioth.debian.org/benchmark.php?test=all&lang=ghc&lang2=ghc
I took a stab at the rev-comp one due to boredom. It's not a space leak, believe it or not, it's *by design*... My god, I think someone is consciously trying to sabotage Haskell's reputation! Instead of reading input line-by-line and doing the computation, it reads a whole bunch of lines (hundreds of megs worth, apparently) and only does away with them when a new header appears. Anyway, I uploaded a dead simple "first-naive-implementation" which is significantly faster (and more elegant): complement i = complArr ! i' where i' = toUpper i complArr = array ('A','Z') (self ++ complAssoc) where self = az `zip` az az = ['A'..'Z'] complAssoc = [ ('A','T'),('C','G'),('G','C'),('T','A'),('U','A'),('M','K'),('R','Y'),('W','W'), ('S','S'),('Y','R'),('K','M'),('V','B'),('D','H'),('D','H'),('B','V'),('N','N') ] process header@('>':xs) = putStrLn header process x = putStrLn (map complement x) main = do xs <- getContents mapM process (lines xs) /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

On Wed, Jan 04, 2006 at 03:02:29AM +0100, Sebastian Sylvan wrote:
I took a stab at the rev-comp one due to boredom. It's not a space leak, believe it or not, it's *by design*...
My god, I think someone is consciously trying to sabotage Haskell's reputation!
Instead of reading input line-by-line and doing the computation, it reads a whole bunch of lines (hundreds of megs worth, apparently) and only does away with them when a new header appears.
Anyway, I uploaded a dead simple "first-naive-implementation" which is significantly faster (and more elegant): ...
The program is supposed to do "reverse and complement". The code you posted just does "complement". Peace, Dylan

On 1/4/06, Sebastian Sylvan
On 1/3/06, Chris Kuklewicz
wrote: Hello,
Where there were no entries to the http://shootout.alioth.debian.org/benchmark.php?test=chameneos&lang=all benchmark, there are now two. The one by Josh Goldfoot is already posted, the one Einar Karttunen and I optimized has been submitted and will run faster/smaller. Our code is at http://haskell.org/hawiki/ChameneosEntry
Now for improving the fasta benchmark, http://shootout.alioth.debian.org/benchmark.php?test=fasta&lang=all , which currently has a space leak in the Haskell entry.
A non-leaking version which has been optimized to run 3.5 times faster is now up at http://haskell.org/hawiki/FastaEntra (ooops..my spelling mistake).
It could still be made to run about 3 times faster, if the other languages are any guide. Anyone want to help polish this one?
Also, two other existing entries have space leaks, as can be seen at http://shootout.alioth.debian.org/benchmark.php?test=all&lang=ghc&lang2=ghc
I took a stab at the rev-comp one due to boredom. It's not a space leak, believe it or not, it's *by design*...
My god, I think someone is consciously trying to sabotage Haskell's reputation!
Instead of reading input line-by-line and doing the computation, it reads a whole bunch of lines (hundreds of megs worth, apparently) and only does away with them when a new header appears.
Anyway, I uploaded a dead simple "first-naive-implementation" which is significantly faster (and more elegant):
complement i = complArr ! i' where i' = toUpper i
complArr = array ('A','Z') (self ++ complAssoc) where self = az `zip` az az = ['A'..'Z'] complAssoc = [ ('A','T'),('C','G'),('G','C'),('T','A'),('U','A'),('M','K'),('R','Y'),('W','W'), ('S','S'),('Y','R'),('K','M'),('V','B'),('D','H'),('D','H'),('B','V'),('N','N') ]
process header@('>':xs) = putStrLn header process x = putStrLn (map complement x)
main = do xs <- getContents mapM process (lines xs)
Oops! Apologies to whoever wrote the orignal version! Apparently I didn't read the spec carefully enough, the sequences are supposed to be reversed, which is why simply writing one line at a time doesn't work. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Hello Sebastian, Wednesday, January 04, 2006, 5:13:57 AM, you wrote:
complArr = array ('A','Z') (self ++ complAssoc)
UArray, indexed starting from '\0', will work faster SS> Oops! Apologies to whoever wrote the orignal version! Apparently I SS> didn't read the spec carefully enough, the sequences are supposed to SS> be reversed, which is why simply writing one line at a time doesn't SS> work. if you need to hold much strings in memory, use FastString package from jhc, darcs, or my own attached to this letter -- Best regards, Bulat mailto:bulatz@HotPOP.com

Summarizing: I collected all of the code snippets posted to this thread into the wiki under http://haskell.org/hawiki/ShootoutEntry including old haskell code already on the shootout. The Fannkuch benchmark drew a lot of interest, but a new entry that creates the correct permutation order (for the 30 printed ones, at least) has not been assembled. But I think the rest of the pieces are there on the wiki, http://haskell.org/hawiki/FannkuchEntry The Reverse-Complement benchmark had a "Complement" code snippet collected from the mailing list to http://haskell.org/hawiki/ReverseComplementEntry The http://haskell.org/hawiki/KnucleotideEntry has only the two old entries from the shootout, which are also the two slowest entries and seem to use 10x too much space (possible leak). The http://haskell.org/hawiki/FastaEntra has a proposed entry which will be submitted soon, but more tweaking is welcome. The http://haskell.org/hawiki/ChameneosEntry has been submitted, and will be on the shootout soon. -- Chris

2006/1/3, Chris Kuklewicz
And finially, the haskel entry for http://shootout.alioth.debian.org/benchmark.php?test=fannkuch&lang=all is currently the *slowest* entry out of 28 languages. It is 813x slower than the c-code, 500x slower than OCaml. Should be easy to make it faster...
In this particular case the flop function is very slow. flop :: Int8 -> [Int8] -> Int8 flop acc (1:xs) = acc flop acc list@(x:xs) = flop (acc+1) mangle where mangle = (reverse front) ++ back (front,back) = splitAt (fromIntegral x) list It can be optimized using a new mangle function: mangle :: Int -> [a] -> [a] mangle m xs = xs' where (rs,xs') = splitAt m xs rs splitAt :: Int -> [a] -> [a] -> ([a], [a]) splitAt 0 xs ys = (xs,ys) splitAt _ [] ys = ([],ys) splitAt m (x:xs) ys = splitAt (m - 1) xs (x:ys) The mangle function transforms the list in one step while the original implementation is using reverse, (++) and splitAt. With this function the new flop is: flop :: Int8 -> [Int8] -> Int8 flop acc (1:xs) = acc flop acc list@(x:xs) = flop (acc+1) (mangle (fromIntegral x) list)

Krasimir Angelov wrote:
2006/1/3, Chris Kuklewicz
: And finially, the haskel entry for http://shootout.alioth.debian.org/benchmark.php?test=fannkuch&lang=all is currently the *slowest* entry out of 28 languages. It is 813x slower than the c-code, 500x slower than OCaml. Should be easy to make it faster...
In this particular case the flop function is very slow.
flop :: Int8 -> [Int8] -> Int8 flop acc (1:xs) = acc flop acc list@(x:xs) = flop (acc+1) mangle where mangle = (reverse front) ++ back (front,back) = splitAt (fromIntegral x) list
It can be optimized using a new mangle function:
mangle :: Int -> [a] -> [a] mangle m xs = xs' where (rs,xs') = splitAt m xs rs
splitAt :: Int -> [a] -> [a] -> ([a], [a]) splitAt 0 xs ys = (xs,ys) splitAt _ [] ys = ([],ys) splitAt m (x:xs) ys = splitAt (m - 1) xs (x:ys)
The mangle function transforms the list in one step while the original implementation is using reverse, (++) and splitAt. With this function the new flop is:
flop :: Int8 -> [Int8] -> Int8 flop acc (1:xs) = acc flop acc list@(x:xs) = flop (acc+1) (mangle (fromIntegral x) list)
You seem to have also discovered the fast way to flop. This benchmarks exactly as fast as the similar entry assembled by Bertram Felgenhauer using Jan-Willem Maessen's flop code:
import System (getArgs) import Data.List (foldl', tails)
rotate n (x:xs) = rot' n xs where rot' 1 xs = x:xs rot' n (x:xs) = x:rot' (n-1) xs
permutations l = foldr perm' [l] [2..length l] where perm' n l = l >>= take n . iterate (rotate n)
flop :: Int -> [Int] -> [Int] flop n xs = rs where (rs, ys) = fl n xs ys fl 0 xs ys = (ys, xs) fl n (x:xs) ys = fl (n-1) xs (x:ys)
steps :: Int -> [Int] -> Int steps n (1:_) = n steps n ts@(t:_) = (steps $! (n+1)) (flop t ts)
main = do args <- getArgs let arg = if null args then 7 else read $ head args mapM_ (putStrLn . concatMap show) $ take 30 $ permutations [1..arg] putStr $ "Pfannkuchen(" ++ show arg ++ ") = " putStrLn $ show $ foldl' (flip (max . steps 0)) 0 $ permutations [1..arg]
[ This is on the wiki, and is 80-90 times faster than the old entry ] I have not been able to make this run any faster by tweaking it. It is easily one of the nicest lazy Haskell-idiom entries on the whole shootout. It does not have to use IO or ST or unboxed anything or even arrays to perform well in small space. * Replacing the foldl' with the more legible foldl' max 0 $ map (steps 0) is a very very tiny speed loss * Going to Word8 instead of Int does not improve speed or save space * Using Control.Monad.fix explicitly is speed neutral:
flopF :: Int -> [Int] -> [Int] flopF n xs = fst $ fix (flop' n xs) where -- flop' :: Int -> [Int] -> ([Int],[Int]) -> ([Int],[Int]) flop' 0 xs ~(_,ys) = (ys,xs) flop' n (x:xs) ~(rs,ys) = flop' (n-1) xs (rs,(x:ys))
-- Chris

On Jan 4, 2006, at 8:11 AM, Chris Kuklewicz wrote:
Krasimir Angelov wrote:
... In this particular case the flop function is very slow. ... It can be optimized using a new mangle function:
mangle :: Int -> [a] -> [a] mangle m xs = xs' where (rs,xs') = splitAt m xs rs
splitAt :: Int -> [a] -> [a] -> ([a], [a]) splitAt 0 xs ys = (xs,ys) splitAt _ [] ys = ([],ys) splitAt m (x:xs) ys = splitAt (m - 1) xs (x:ys)
The mangle function transforms the list in one step while the original implementation is using reverse, (++) and splitAt. With this function the new flop is:
flop :: Int8 -> [Int8] -> Int8 flop acc (1:xs) = acc flop acc list@(x:xs) = flop (acc+1) (mangle (fromIntegral x) list)
You seem to have also discovered the fast way to flop.
This benchmarks exactly as fast as the similar entry assembled by Bertram Felgenhauer using Jan-Willem Maessen's flop code:
... flop :: Int -> [Int] -> [Int] flop n xs = rs where (rs, ys) = fl n xs ys fl 0 xs ys = (ys, xs) fl n (x:xs) ys = fl (n-1) xs (x:ys)
Indeed, I believe these are isomorphic. My "fl" function is the "splitAt" function above, perhaps more descriptively named "splitAtAndReverseAppend"... -Jan-Willem Maessen
participants (9)
-
Bertram Felgenhauer
-
Bulat Ziganshin
-
Cale Gibbard
-
Chris Kuklewicz
-
Dylan Thurston
-
Iavor Diatchki
-
Jan-Willem Maessen
-
Krasimir Angelov
-
Sebastian Sylvan