
In some spare time over the holidays I cooked up three shootout entries, for Fasta, the Meteor Contest, and Reverse Complement. I should probably have tossed them to haskell-cafe before submission, for review and ideas, but they're up now. In any case, all three were great learning experiences, but could use some other eyes and ideas to be the best that they can. First up is the meteor-contest entry. http://shootout.alioth.debian.org/gp4/benchmark.php? test=meteor&lang=ghc&id=5 This is the clear win of the bunch, with significantly improved time thanks to its translation of the better algorithm from Clean. However, it's still missing something. Nearly all its time is spent in a tight loop in the solveCell function near the end of the code. I tried unboxing this, but failed because it spends the bulk of its time applying a bitwise and between the recursively passed value and a piece of data retrieved from the masksAtCell data structure which is of type :: Array (Row,Col) (Array Color [Mask]). (Note that Mask, Color, Row and Col are all type synonyms for Int that I added for readability). As each Mask is stored in this list, the masks can't easily be unboxed -- some sort of custom data structure built using the FFI seems in order here. If anyone wants to tackle this, I think it could be a big win for performance. Next is reverse-complement. http://shootout.alioth.debian.org/gp4/benchmark.php? test=revcomp&lang=ghc&id=3 This *would* be a big win except I dimly doubled memory usage from the previous entry due to filtering newlines explicitly -- which does, one should note, provide a large performance gain. The solution here seems the most obvious -- roll the newline stripping into the destructive modifications performed in the revcomp function, as the winning C++ entry does. I'll probably get around to this eventually, but if someone else wants to try to implement this or any other performance improvements, please jump right in. Additionally, there might be some other tricks to reducing its memory usage that escape me at the moment (noting, of course, that using a strict bytestring, as we should, its unavoidable that we consume the entire contents of the input at once... I think?) Finally, there's fasta. http://shootout.alioth.debian.org/gp4/benchmark.php? test=fasta&lang=ghc&id=2 This one really depresses me. It outperforms the previous version by roughly 20% on my machine (PPC) but underperforms by roughly the same amount on the shootout box. If you compare it to dons previous version, the "optimizations" I attempted should be pretty obvious. First, I precompute the partial sums for the frequency tables and alter the choose function accordingly. This is a pretty basic measure that all the better entries seem to do. Next, I unboxed the random function, which yielded big speedups However, given that we use an unfoldN, as we should, I couldn't very well pass it a function that returned something of kind #, (especially as Maybe, which unfoldN uses, is of kind *->*). Thus, I hid the unrolled loop in a lazy list of floats that is passed instead of a random seed. I suspect that for some reason I don't understand relating to differences in processors, GHC's internal handling of floating point math, or who knows what, this somehow is the source of the slowdown. If someone with an Intel Pentium 4 machine comparable to that of the shootout box wants to take a look at this code and see why it underperforms, I'd be much obliged. It really seems to me that GHC's fasta performance is far below where it should be (4x slower than Java!) , and I'd like to get its numbers up somehow. Thanks, Sterl p.s. It looks like they've depreciated chameneos in favor of a new version, chameneos-redux. As this was one of the places Haskell really rocked the competition, it would probably be worth updating the Haskell entry for the new benchmark. Also, the n-bodies benchmark seems like another that could be much improved.

s.clover:
In some spare time over the holidays I cooked up three shootout entries, for Fasta, the Meteor Contest, and Reverse Complement. I
Yay!
First up is the meteor-contest entry.
http://shootout.alioth.debian.org/gp4/benchmark.php? test=meteor&lang=ghc&id=5
This is the clear win of the bunch, with significantly improved time thanks to its translation of the better algorithm from Clean.
Well done! Though looks like we'll have to follow the C++ implementation to be really competitive.
Next is reverse-complement.
http://shootout.alioth.debian.org/gp4/benchmark.php? test=revcomp&lang=ghc&id=3
Very good. I'm glad someone looked at that, since the old code was moderately naive (first bytestring effort).
Finally, there's fasta.
http://shootout.alioth.debian.org/gp4/benchmark.php? test=fasta&lang=ghc&id=2
Yeah, we should do something better here. Hmm.
p.s. It looks like they've depreciated chameneos in favor of a new version, chameneos-redux. As this was one of the places Haskell really rocked the competition, it would probably be worth updating
Definitely. I note also we're beating Erlang on the new thread-ring benchmark too, http://shootout.alioth.debian.org/gp4/benchmark.php?test=threadring&lang=all
the Haskell entry for the new benchmark. Also, the n-bodies benchmark seems like another that could be much improved.
Yeah, that's a hard one. -- Don

Sterling Clover wrote: ...
Finally, there's fasta.
http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=ghc&id=2
This one really depresses me. It outperforms the previous version by roughly 20% on my machine (PPC) but underperforms by roughly the same amount on the shootout box. ...
Well done. Great, I'll have a play with your new version of Fasta, I've just upgraded to an Intel Core 2 Duo 2.33GHz. Something I found with Dons version on my machine was that if I removed all the exclamation marks and the -fbang-patterns bit at the top it went about 20% faster as well as being much cleaner code, but with my very rudimentary understanding of Haskell I wasn't entirely sure it would produce the same results if I did this and didn't get round to checking. I suspect the majority of Fasta time is spent in the rand routine. My wild cunning plan for this was that it might be possible to avoid doing the conversion to a float every time the routine is called. My thinking is that it could just return an Int most of the time because the number is only used in, I think, a less-than comparison outside rand which could almost always be decided by an Int less-than comparison rather than a float less-than comparison. A clever lazy less-than mechanism could get rand to give a float when the Int comparison is too close to be certain. Probably be classified by the shootout as cheating though. And it's way beyond what I could currently write in Haskell. Richard.

| Something I found with Dons version on my machine was that if I removed | all the exclamation marks and the -fbang-patterns bit at the top it went | about 20% faster as well as being much cleaner code, but with my very | rudimentary understanding of Haskell I wasn't entirely sure it would | produce the same results if I did this and didn't get round to checking. If, after investigation (and perhaps checking with Don) you find that adding bangs makes your program go slower, even though the function is in fact strict (otherwise it might go slower because it's just doing more work!) then I'd love to see a test case. Simon

Simon Peyton-Jones wrote:
| Something I found with Dons version on my machine was that if I removed | all the exclamation marks and the -fbang-patterns bit at the top it went | about 20% faster as well as being much cleaner code, but with my very | rudimentary understanding of Haskell I wasn't entirely sure it would | produce the same results if I did this and didn't get round to checking.
If, after investigation (and perhaps checking with Don) you find that adding bangs makes your program go slower, even though the function is in fact strict (otherwise it might go slower because it's just doing more work!) then I'd love to see a test case.
I wonder if this could be related to what I observed with AVL trees and mentioned a while back (using a strict data type is slower than using explicit seqs to get the same strictness). Regards -- Adrian Hey

| > If, after investigation (and perhaps checking with Don) you find that adding bangs makes your program go | slower, even though the function is in fact strict (otherwise it might go slower because it's just doing more | work!) then I'd love to see a test case. | | I wonder if this could be related to what I observed with AVL trees and | mentioned a while back (using a strict data type is slower than using | explicit seqs to get the same strictness). Could indeed be. That message is still in my performance-tuning pile; it's not forgotten, just buried. But the more evidence, the stronger the incentive to investigate. Simon

Simon Peyton-Jones wrote:
| Something I found with Dons version on my machine was that if I removed | all the exclamation marks and the -fbang-patterns bit at the top it went | about 20% faster as well as being much cleaner code, but with my very | rudimentary understanding of Haskell I wasn't entirely sure it would | produce the same results if I did this and didn't get round to checking.
If, after investigation (and perhaps checking with Don) you find that adding bangs makes your program go slower, even though the function is in fact strict (otherwise it might go slower because it's just doing more work!) then I'd love to see a test case.
Sorry, I don't understand the code, I've jumped in the deep-end before learning to swim, but I can now tell you it's producing the same results when I remove some of the exclamation marks. I've checked with an MD5 on the output. The timings in seconds for 10,000,000 iterations averaged over 5 runs. (There was quite a bit of variation.) Compiled with GHC 6.6.1. (I got stuck compiling it under 6.8) The fancy compile options are from the shootout page. Dons original program 13.26 compiled ghc --make Dons original program 12.54 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 Removed 3 bangs from rand 11.47 compiled ghc --make Removed 3 bangs from rand 11.57 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 The code below is Dons program from http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=ghc&id=0 with a timing function added by me. The rand function is where I removed three exclamation marks to make the program faster. Previously I removed different combinations of bangs. Some bangs seem to make it faster and some seem to make it slower. Richard. ------------------------------------------------------------------ {-# OPTIONS -O2 -optc-O2 -optc-ffast-math -fbang-patterns -fexcess-precision #-} -- -- The Computer Language Benchmarks Game -- http://shootout.alioth.debian.org/ -- -- Contributed by Don Stewart -- A lazy bytestring solution. -- -- Add: -- -optc-mfpmath=sse -optc-msse2 -- import System import Data.Word import Control.Arrow import Text.Printf -- RK added. import System.CPUTime -- RK added. import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy.Char8 as C (pack,unfoldr) import qualified Data.ByteString as S import Data.ByteString.Base -- RK added this time function. time :: IO t -> IO t time a = do start <- getCPUTime v <- a end <- getCPUTime let diff = (fromIntegral (end - start)) / (10 ^12) printf "Calc time %0.3f \n" (diff :: Double) return v main = do -- RK modified main to time the computation. time $ comp -- RK mod. comp :: IO Int -- RK mod. comp = do -- RK mod. This was Dons main. I just renamed to comp. n <- getArgs >>= readIO . head writeFasta "ONE" "Homo sapiens alu" (n*2) (L.cycle alu) g <- unfold "TWO" "IUB ambiguity codes" (n*3) (look iubs) 42 unfold "THREE" "Homo sapiens frequency" (n*5) (look homs) g ------------------------------------------------------------------------ -- -- lazily unfold the randomised dna sequences -- unfold l t n f !g = putStrLn (">" ++ l ++ " " ++ t) >> unroll f g n unroll :: (Int -> (Word8, Int)) -> Int -> Int -> IO Int unroll f = loop where loop r 0 = return r loop !r !i = case S.unfoldrN m (Just . f) r of (!s, Just r') -> do S.putStrLn s loop r' (i-m) where m = min i 60 look ds !k = let (d,j) = rand k in (choose ds d, j) choose :: [(Word8,Float)] -> Float -> Word8 choose [(b,_)] _ = b choose ((!b,!f):xs) !p = if p < f then b else choose xs (p-f) ------------------------------------------------------------------------ -- -- only demand as much of the infinite sequence as we require writeFasta label title n s = do putStrLn $ ">" ++ label ++ " " ++ title let (t:ts) = L.toChunks s go ts t n where go ss !s !n | l60 && n60 = S.putStrLn l >> go ss r (n-60) | n60 = S.putStr s >> S.putStrLn a >> go (tail ss) b (n-60) | n <= ln = S.putStrLn (S.take n s) | otherwise = S.putStr s >> S.putStrLn (S.take (n-ln) (head ss)) where !ln = S.length s !l60 = ln >= 60 !n60 = n >= 60 (l,r) = S.splitAt 60 s (a,b) = S.splitAt (60-ln) (head ss) ------------------------------------------------------------------------ im = 139968 ia = 3877 ic = 29573 rand :: Int -> (Float, Int) rand seed = (newran,newseed) -- RK modified. Was !seed where newseed = (seed * ia + ic) `rem` im -- RK mod. Was !newseed newran = 1.0 * fromIntegral newseed / imd -- RK. Was !newran imd = fromIntegral im ------------------------------------------------------------------------ alu = C.pack "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG\ \GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA\ \CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT\ \ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA\ \GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG\ \AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC\ \AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA" iubs = map (c2w *** id) [('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02) ,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02) ,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)] homs = map (c2w *** id) [('a',0.3029549426680),('c',0.1979883004921) ,('g',0.1975473066391),('t',0.3015094502008)]

r.kelsall:
Simon Peyton-Jones wrote:
| Something I found with Dons version on my machine was that if I removed | all the exclamation marks and the -fbang-patterns bit at the top it went | about 20% faster as well as being much cleaner code, but with my very | rudimentary understanding of Haskell I wasn't entirely sure it would | produce the same results if I did this and didn't get round to checking.
If, after investigation (and perhaps checking with Don) you find that adding bangs makes your program go slower, even though the function is in fact strict (otherwise it might go slower because it's just doing more work!) then I'd love to see a test case.
Sorry, I don't understand the code, I've jumped in the deep-end before learning to swim, but I can now tell you it's producing the same results when I remove some of the exclamation marks. I've checked with an MD5 on the output.
The timings in seconds for 10,000,000 iterations averaged over 5 runs. (There was quite a bit of variation.) Compiled with GHC 6.6.1. (I got stuck compiling it under 6.8) The fancy compile options are from the shootout page.
Dons original program 13.26 compiled ghc --make Dons original program 12.54 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 Removed 3 bangs from rand 11.47 compiled ghc --make Removed 3 bangs from rand 11.57 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4
The code below is Dons program from
http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=ghc&id=0
with a timing function added by me. The rand function is where I removed three exclamation marks to make the program faster. Previously I removed different combinations of bangs. Some bangs seem to make it faster and some seem to make it slower.
There may well have been changes to the strictness analyser that make some of the bangs (or most) unnecessary now. Also, its very likely I didn't check all combinations of strict and lazy arguments for the optimal evaluation strategy :) If it seems to be running consitently faster (and producing better Core code), by all means submit. I don't think this is a ghc bug or anything like that though: just overuse of bangs, leading to unnecessary work. -- Don

Don Stewart wrote: ...
There may well have been changes to the strictness analyser that make some of the bangs (or most) unnecessary now. Also, its very likely I didn't check all combinations of strict and lazy arguments for the optimal evaluation strategy :)
I suspect the optimum details will change again when we get to GHC 6.8. Yes, I got bored trying different combinations too. A genetic algorithm that knocks out different combinations might be fun. The ones in rand seem to make the most difference and I decided the code was easier to read if I took them all out.
If it seems to be running consitently faster (and producing better Core code), by all means submit. I don't think this is a ghc bug or anything like that though: just overuse of bangs, leading to unnecessary work.
-- Don
It was consistently faster on my machine, but it would be interesting to compare with the run-times on Sterling's PPC machine. I'll have a play with Sterling's new program and report. Richard.

| There may well have been changes to the strictness analyser that make | some of the bangs (or most) unnecessary now. Also, its very likely | I didn't check all combinations of strict and lazy arguments for the | optimal evaluation strategy :) | | If it seems to be running consitently faster (and producing better Core | code), by all means submit. I don't think this is a ghc bug or anything | like that though: just overuse of bangs, leading to unnecessary work. You might think that unnecessary bangs shouldn't lead to unnecessary work -- if GHC knows it's strict *and* you bang the argument, it should still only be evaluated once. But it can happen. Consider f !xs = length xs Even though 'length' will evaluate its argument, f nevertheless evaluates it too. Bangs say "evaluate it now", like seq, because we may be trying to control space usage. In this particular case it's silly, because the *first* thing length does is evaluate its argument, but that's not true of every strict function. That's why I say it'd be good to have well-characterised examples. It *may* be something like what I describe. Or it may be a silly omission somewhere. Simon

Simon Peyton-Jones wrote:
You might think that unnecessary bangs shouldn't lead to unnecessary work -- if GHC knows it's strict *and* you bang the argument, it should still only be evaluated once. But it can happen. Consider
f !xs = length xs
Even though 'length' will evaluate its argument, f nevertheless evaluates it too.
I'm replying to a guru here, so I should keep my voice low; but I'd like to point out what might result in a misunderstanding for other readers of haskell-cafe. Contrasting both the bang pattern and the usage of length causing f to be strict, one might suppose that a bang pattern, when used on a list, will cause it to be evaluated in the same way as length does. However,
the *first* thing length does is evaluate its argument,
and it will furthermore evaluate the argument list recursively, as much as is necessary to determine its length. On the other hand, given g !xs = () evaluating g [0..] will terminate, because g is only strict in the constructor of its argument, which is (:). The list data type itself, however, is non-strict. Kalman ---------------------------------------------------------------------- Free pop3 email with a spam filter. http://www.bluebottle.com/tag/5

Simon Peyton-Jones wrote:
| There may well have been changes to the strictness analyser that make | some of the bangs (or most) unnecessary now. Also, its very likely | I didn't check all combinations of strict and lazy arguments for the | optimal evaluation strategy :) | | If it seems to be running consitently faster (and producing better Core | code), by all means submit. I don't think this is a ghc bug or anything | like that though: just overuse of bangs, leading to unnecessary work.
You might think that unnecessary bangs shouldn't lead to unnecessary work -- if GHC knows it's strict *and* you bang the argument, it should still only be evaluated once. But it can happen. Consider
f !xs = length xs
Even though 'length' will evaluate its argument, f nevertheless evaluates it too. Bangs say "evaluate it now", like seq, because we may be trying to control space usage. In this particular case it's silly, because the *first* thing length does is evaluate its argument, but that's not true of every strict function.
That's why I say it'd be good to have well-characterised examples. It *may* be something like what I describe. Or it may be a silly omission somewhere.
A little addition to what Simon mentioned above: while it is definitely true that adding unnecessary bangs can cause a slowdown, the slowdown should be much less with 6.8.1 because in the common case each evaluation will be an inline test rather than an out-of-line indirect jump and return. So, with 6.8.x, you should feel more free to sprinkle those bangs... Cheers, Simon

I tried the same thing on my box, and indeed the version that isn't strict in the rand function outperforms the original by a fair margin, and seems to do slightly better than my own as well. Killing the bangs in the unroll function also seems to help (especially that in (s!, Just r')). Why this is is slightly beyond me at the moment. Killing the bang before the b in the choose function also adds a speedup, which makes perfect sense, as there's no reason to force strictness on an argument you're throwing away a good span of the time. The bang before the k in the look function should stay -- in fact, it seems the appropriate place to force the evaluation that we were forcing too early in some of the other functions. Ditto the bang before the g in unfold. As for the bangs in writeFasta, better to leave them be and not risk messing things up, since, as is, the writeFasta function uses nearly no cycles compared to random generation. At this point, given that lazier random generation seems to be better, using unboxed types for this seems a losing idea, as they'd force strictness all over again, so that's not worth trying to salvage. I'm still curious if the pre-calculation of partial sums that I did works well across processors, as I don't see why it shouldn't. My less-strictified version of Don's code is attached, and below are the functions you'll need to insert/replace to make the partial-sums optimization work. Regards, Sterl P.S., if you're running on a unix, I find it much more convenient to use the time program rather than rolling timing into my own code. I tested this program using, for example > time ./fastaRefUnStrict 250000 | tail -- Code for partial sums: choose :: [(Word8,Float)] -> Float -> Word8 choose [(b,_)] _ = b choose ((b,f):xs) p = if p < f then b else choose xs p makeCumul :: [(Word8,Float)]->[(Word8,Float)] makeCumul freqMap = tail . reverse . foldl' fm [(undefined,0)] $ freqMap where fm acc@((_,ct):rst) (w,f) = (w,ct + f) : acc iubs :: [(Word8,Float)] iubs = makeCumul $ map (first c2w) [('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02) ,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02) ,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)] homs :: [(Word8,Float)] homs = makeCumul $ map (first c2w) [('a',0.3029549426680),('c',0.1979883004921) ,('g',0.1975473066391),('t',0.3015094502008)] On Nov 27, 2007, at 2:09 PM, Richard Kelsall wrote:
Simon Peyton-Jones wrote:
| Something I found with Dons version on my machine was that if I removed | all the exclamation marks and the -fbang-patterns bit at the top it went | about 20% faster as well as being much cleaner code, but with my very | rudimentary understanding of Haskell I wasn't entirely sure it would | produce the same results if I did this and didn't get round to checking. If, after investigation (and perhaps checking with Don) you find that adding bangs makes your program go slower, even though the function is in fact strict (otherwise it might go slower because it's just doing more work!) then I'd love to see a test case.
Sorry, I don't understand the code, I've jumped in the deep-end before learning to swim, but I can now tell you it's producing the same results when I remove some of the exclamation marks. I've checked with an MD5 on the output.
The timings in seconds for 10,000,000 iterations averaged over 5 runs. (There was quite a bit of variation.) Compiled with GHC 6.6.1. (I got stuck compiling it under 6.8) The fancy compile options are from the shootout page.
Dons original program 13.26 compiled ghc --make Dons original program 12.54 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 Removed 3 bangs from rand 11.47 compiled ghc --make Removed 3 bangs from rand 11.57 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4
The code below is Dons program from
http://shootout.alioth.debian.org/gp4/benchmark.php? test=fasta&lang=ghc&id=0
with a timing function added by me. The rand function is where I removed three exclamation marks to make the program faster. Previously I removed different combinations of bangs. Some bangs seem to make it faster and some seem to make it slower.
Richard.
------------------------------------------------------------------ {-# OPTIONS -O2 -optc-O2 -optc-ffast-math -fbang-patterns -fexcess- precision #-} -- -- The Computer Language Benchmarks Game -- http://shootout.alioth.debian.org/ -- -- Contributed by Don Stewart -- A lazy bytestring solution. -- -- Add: -- -optc-mfpmath=sse -optc-msse2 --
import System import Data.Word import Control.Arrow
import Text.Printf -- RK added. import System.CPUTime -- RK added.
import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy.Char8 as C (pack,unfoldr) import qualified Data.ByteString as S import Data.ByteString.Base
-- RK added this time function. time :: IO t -> IO t time a = do start <- getCPUTime v <- a end <- getCPUTime let diff = (fromIntegral (end - start)) / (10 ^12) printf "Calc time %0.3f \n" (diff :: Double) return v
main = do -- RK modified main to time the computation. time $ comp -- RK mod.
comp :: IO Int -- RK mod. comp = do -- RK mod. This was Dons main. I just renamed to comp. n <- getArgs >>= readIO . head writeFasta "ONE" "Homo sapiens alu" (n*2) (L.cycle alu) g <- unfold "TWO" "IUB ambiguity codes" (n*3) (look iubs) 42 unfold "THREE" "Homo sapiens frequency" (n*5) (look homs) g
---------------------------------------------------------------------- -- -- -- lazily unfold the randomised dna sequences --
unfold l t n f !g = putStrLn (">" ++ l ++ " " ++ t) >> unroll f g n
unroll :: (Int -> (Word8, Int)) -> Int -> Int -> IO Int unroll f = loop where loop r 0 = return r loop !r !i = case S.unfoldrN m (Just . f) r of (!s, Just r') -> do S.putStrLn s loop r' (i-m) where m = min i 60
look ds !k = let (d,j) = rand k in (choose ds d, j)
choose :: [(Word8,Float)] -> Float -> Word8 choose [(b,_)] _ = b choose ((!b,!f):xs) !p = if p < f then b else choose xs (p-f)
---------------------------------------------------------------------- -- -- -- only demand as much of the infinite sequence as we require
writeFasta label title n s = do putStrLn $ ">" ++ label ++ " " ++ title let (t:ts) = L.toChunks s go ts t n where go ss !s !n | l60 && n60 = S.putStrLn l >> go ss r (n-60) | n60 = S.putStr s >> S.putStrLn a >> go (tail ss) b (n-60) | n <= ln = S.putStrLn (S.take n s) | otherwise = S.putStr s >> S.putStrLn (S.take (n-ln) (head ss)) where !ln = S.length s !l60 = ln >= 60 !n60 = n >= 60 (l,r) = S.splitAt 60 s (a,b) = S.splitAt (60-ln) (head ss)
---------------------------------------------------------------------- --
im = 139968 ia = 3877 ic = 29573
rand :: Int -> (Float, Int) rand seed = (newran,newseed) -- RK modified. Was !seed where newseed = (seed * ia + ic) `rem` im -- RK mod. Was !newseed newran = 1.0 * fromIntegral newseed / imd -- RK. Was ! newran imd = fromIntegral im
---------------------------------------------------------------------- --
alu = C.pack "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG\ \GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA\ \CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT\ \ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA\ \GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG\ \AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC\ \AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"
iubs = map (c2w *** id) [('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02) ,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02) ,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)]
homs = map (c2w *** id) [('a',0.3029549426680),('c',0.1979883004921) ,('g',0.1975473066391),('t',0.3015094502008)]
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Sterling Clover wrote:
I'm still curious if the pre-calculation of partial sums that I did works well across processors, as I don't see why it shouldn't. My less-strictified version of Don's code is attached, and below are the functions you'll need to insert/replace to make the partial-sums optimization work.
Hello Sterling, I've timed your new Fasta with optimised bangs - it's the fastest so far. But the pre-calculated partial-sums version seems to go a bit slower for some unknown reason. Seconds Optimised bangs program 11.20 compiled ghc --make Optimised bangs program 10.73 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 Partial-sums program 11.97 compiled ghc --make Partial-sums program 11.14 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 This is on my GHC 6.6.1, W2K, Intel Core 2 Duo 2.33GHz machine - same as for the previous timings I gave in this thread. Richard.

Was this with tossing the partial sums code into the optimised bangs
program? Weird. I wonder if profiling will help explain why? In any case, If
nobody comes up with any other tweaks, I'll probably submit the optimised
bangs version to the shootout this weekend.
--S
On Nov 30, 2007 1:30 PM, Richard Kelsall
Sterling Clover wrote:
I'm still curious if the pre-calculation of partial sums that I did works well across processors, as I don't see why it shouldn't. My less-strictified version of Don's code is attached, and below are the functions you'll need to insert/replace to make the partial-sums optimization work.
Hello Sterling, I've timed your new Fasta with optimised bangs - it's the fastest so far. But the pre-calculated partial-sums version seems to go a bit slower for some unknown reason.
Seconds Optimised bangs program 11.20 compiled ghc --make Optimised bangs program 10.73 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 Partial-sums program 11.97 compiled ghc --make Partial-sums program 11.14 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4
This is on my GHC 6.6.1, W2K, Intel Core 2 Duo 2.33GHz machine - same as for the previous timings I gave in this thread.
Richard.

s.clover:
Was this with tossing the partial sums code into the optimised bangs program? Weird. I wonder if profiling will help explain why? In any case, If nobody comes up with any other tweaks, I'll probably submit the optimised bangs version to the shootout this weekend.
Please go ahead and submit. :) and remember to upload also to our wiki, so we have a permanent record of the attempt, http://haskell.org/haskellwiki/Shootout Note down any ideas you have. -- Don

Don Stewart wrote:
Please go ahead and submit. :) and remember to upload also to our wiki, so we have a permanent record of the attempt,
http://haskell.org/haskellwiki/Shootout
Note down any ideas you have.
"Now that GHC 6.6 is available, please you it"? Last time I looked at the shootout website, 6 of the GHC entries were marked simply as "error". Do we know why, or am I missing something obvious? (I find the site to be a little unintuitive at times...) Unfortunately I don't understand what half the benchmarks are supposed to be, which makes it rather hard to follow.

andrewcoppin:
Don Stewart wrote:
Please go ahead and submit. :) and remember to upload also to our wiki, so we have a permanent record of the attempt,
http://haskell.org/haskellwiki/Shootout
Note down any ideas you have.
"Now that GHC 6.6 is available, please you it"?
Looks like something broke in an edit. Feel free to correct it.
Last time I looked at the shootout website, 6 of the GHC entries were marked simply as "error". Do we know why, or am I missing something obvious? (I find the site to be a little unintuitive at times...)
Sounds like you're looking at the wrong thing? http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=ghc&lang2=ghc

Don Stewart wrote:
andrewcoppin:
"Now that GHC 6.6 is available, please you it"?
Looks like something broke in an edit. Feel free to correct it.
Oh well. ;-) But then, the GHC wiki still says "The 6.8 branch is the current STABLE, and we are in the 6.8.1 release candidate phase. We aim to release 6.8.1 around the beginning of October." I guess most people are busy writing the "real" stuff rather than updating documentation.
Last time I looked at the shootout website, 6 of the GHC entries were marked simply as "error". Do we know why, or am I missing something obvious? (I find the site to be a little unintuitive at times...)
Sounds like you're looking at the wrong thing?
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=ghc&lang2=ghc
Mmm, interesting. I was looking at http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=ghc&lang2=ghc
participants (8)
-
Adrian Hey
-
Andrew Coppin
-
Don Stewart
-
Kalman Noel
-
Richard Kelsall
-
Simon Marlow
-
Simon Peyton-Jones
-
Sterling Clover