Re: [Haskell-cafe] "sum" in hmatrix and blas?

8 Jun 2008

      Alberto Ruiz  writes:
...
My experience is that Haskell allocation time is very fast and usually
negligible in most non trivial matrix computations.
A good thing about
sum v = constant 1 (dim v) <.> v
is that a constant vector is efficiently created internally (not from
an intermediate Haskell list), and the inner product will be computed
by the possibly optimized blas version available in your machine.
You can also write simple definitions like the next one for the
average of the rows of a matrix as a simple vector-matrix product:
mean m = w <> m
    where w = constant k r
          r = rows m
          k = 1 / fromIntegral r
I prefer high level "index free" matrix operations to C-like
code. Definitions are clearer, have less bugs, and are also more
efficient if you use optimized numerical libs.
In any case, many algorithms can be solved by the recursive approach
described by Tomas.
After reading don's blog, I tried to make a test with both
methods.  The code is very simple, as following

module Main where
import System
import Numeric.LinearAlgebra

vsum1 :: Vector Double -> Double
vsum1 v = constant 1 (dim v) <.> v

vsum2 :: Vector Double -> Double
vsum2 v = go (d - 1) 0
    where
      d = dim v
      go :: Int -> Double -> Double
      go 0 s = s + (v @> 0)
      go !j !s = go (j - 1) (s + (v @> j))

mean :: Vector Double -> Double
mean v = vsum v / fromIntegral (dim v)
    where vsum = vsum1

main :: IO ()
main = do
  fn:nrow:ncol:_ <- getArgs
  print . mean . flatten =<< fromFile fn (read nrow, read ncol)

Compile it with "-O2 -optc-O2", and run with a data set of
length 5000000.  The results are

with "vsum1":
 80,077,984 bytes allocated in the heap
      2,208 bytes copied during GC (scavenged)
         64 bytes copied during GC (not scavenged)
 40,894,464 bytes maximum residency (2 sample(s))
  %GC time       0.0%  (0.0% elapsed)
  Alloc rate    35,235,448 bytes per MUT second
./vsum1 huge 5000000 1  2.25s user 0.09s system 99% cpu 2.348 total

This is reasonable, exactly two copies of vector with size
of 40MB.

with "vsum2":
560,743,120 bytes allocated in the heap
     19,160 bytes copied during GC (scavenged)
     15,920 bytes copied during GC (not scavenged)
 40,919,040 bytes maximum residency (2 sample(s))
  %GC time       0.3%  (0.3% elapsed)
  Alloc rate    222,110,261 bytes per MUT second
./mean2 huge 5000000 1  2.53s user 0.06s system 99% cpu 2.598 total

This is strange.  A lot of extra usage of heap?  Probably
because '@>' is not efficient?  So it looks like the
inner-product approach wins with a fairly margin.
-- 
    c/*    __o/*
    <\     * (__
    */\      <

Re: [Haskell-cafe] "sum" in hmatrix and blas?

Xiao-Yong Jin