
Alberto Ruiz
My experience is that Haskell allocation time is very fast and usually negligible in most non trivial matrix computations.
A good thing about
sum v = constant 1 (dim v) <.> v
is that a constant vector is efficiently created internally (not from an intermediate Haskell list), and the inner product will be computed by the possibly optimized blas version available in your machine.
You can also write simple definitions like the next one for the average of the rows of a matrix as a simple vector-matrix product:
mean m = w <> m where w = constant k r r = rows m k = 1 / fromIntegral r
I prefer high level "index free" matrix operations to C-like code. Definitions are clearer, have less bugs, and are also more efficient if you use optimized numerical libs.
In any case, many algorithms can be solved by the recursive approach described by Tomas.
After reading don's blog, I tried to make a test with both methods. The code is very simple, as following module Main where import System import Numeric.LinearAlgebra vsum1 :: Vector Double -> Double vsum1 v = constant 1 (dim v) <.> v vsum2 :: Vector Double -> Double vsum2 v = go (d - 1) 0 where d = dim v go :: Int -> Double -> Double go 0 s = s + (v @> 0) go !j !s = go (j - 1) (s + (v @> j)) mean :: Vector Double -> Double mean v = vsum v / fromIntegral (dim v) where vsum = vsum1 main :: IO () main = do fn:nrow:ncol:_ <- getArgs print . mean . flatten =<< fromFile fn (read nrow, read ncol) Compile it with "-O2 -optc-O2", and run with a data set of length 5000000. The results are with "vsum1": 80,077,984 bytes allocated in the heap 2,208 bytes copied during GC (scavenged) 64 bytes copied during GC (not scavenged) 40,894,464 bytes maximum residency (2 sample(s)) %GC time 0.0% (0.0% elapsed) Alloc rate 35,235,448 bytes per MUT second ./vsum1 huge 5000000 1 2.25s user 0.09s system 99% cpu 2.348 total This is reasonable, exactly two copies of vector with size of 40MB. with "vsum2": 560,743,120 bytes allocated in the heap 19,160 bytes copied during GC (scavenged) 15,920 bytes copied during GC (not scavenged) 40,919,040 bytes maximum residency (2 sample(s)) %GC time 0.3% (0.3% elapsed) Alloc rate 222,110,261 bytes per MUT second ./mean2 huge 5000000 1 2.53s user 0.06s system 99% cpu 2.598 total This is strange. A lot of extra usage of heap? Probably because '@>' is not efficient? So it looks like the inner-product approach wins with a fairly margin. -- c/* __o/* <\ * (__ */\ <