
aeyakovenko:
i get the same crappy performance with:
$ cat htestdot.hs {-# OPTIONS_GHC -O2 -fexcess-precision -funbox-strict-fields -fglasgow-exts -fbang-patterns -lcblas#-} module Main where
import Data.Vector.Dense.IO import Control.Monad
main = do let size = 10 let times = 10*1000*1000 v1::IOVector Int Double <- newListVector size $ replicate size 0.1 v2::IOVector Int Double <- newListVector size $ replicate size 0.1 replicateM_ times $ v1 `getDot` v2
replicateM_ is using a list underneath for control as well, replicateM n x = sequence (replicate n x) Try writing a simple recursive loop, as Dan suggested. No list node forcing overhead, so in a very tight loop you'll just want the index in a register. See here for more examples of tight register loops, http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast In general, if you're chasing C performance for a loop, your best bet is to write a loop first. Then later see if you can get the same kind of code from higher order, lazy, monadic functions. -- Don