blas bindings, why are they so much slower the C?

18 Jun 2008


      here is the C:

#include 
#include 

int main() {
   int size = 1024;
   int ii = 0;
   double* v1 = malloc(sizeof(double) * (size));
   double* v2 = malloc(sizeof(double) * (size));
   for(ii = 0; ii < size*size; ++ii) {
      double _dd = cblas_ddot(0, v1, size, v2, size);
   }
   free(v1);
   free(v2);
}

this is the haskell:

module Main where

import Data.Vector.Dense.IO

main = do
   let size = 1024
   v1::IOVector Int Double <- newListVector size [0..]
   v2::IOVector Int Double <- newListVector size [0..]
   mapM_ (\ ii -> do v1 `getDot` v2) [0..size*size]

time ./testdot

real    0m0.017s
user    0m0.010s
sys     0m0.010s

time ./htestdot

real    0m4.692s
user    0m4.670s
sys     0m0.030s

so like 250x difference

htestdot.prof is no help

   Tue Jun 17 20:46 2008 Time and Allocation Profiling Report  (Final)

      htestdot +RTS -p -RTS

   total time  =        3.92 secs   (196 ticks @ 20 ms)
   total alloc = 419,653,032 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

main                           Main                  88.3   83.0
CAF                            Main                  11.7   17.0


                        individual    inherited
COST CENTRE              MODULE
       no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN
         1           0   0.0    0.0   100.0  100.0
 CAF                     Main
       216           7  11.7   17.0   100.0  100.0
  main                   Main
       222           1  88.3   83.0    88.3   83.0
 CAF                     GHC.Float
       187           1   0.0    0.0     0.0    0.0
 CAF                     GHC.Handle
       168           3   0.0    0.0     0.0    0.0

Anatoly Yakovenko

Bryan O'Sullivan

Anatoly Yakovenko

Adam Langley

Jules Bean

David Roundy

David Roundy

Anatoly Yakovenko

Dan Doel

Anatoly Yakovenko

Don Stewart

Don Stewart

Richard A. O'Keefe

tags

participants (8)