
On Wed, Jun 18, 2008 at 09:16:24AM -0700, Anatoly Yakovenko wrote:
#include
#include int main() { int size = 1024; int ii = 0; double* v1 = malloc(sizeof(double) * (size)); double* v2 = malloc(sizeof(double) * (size)); for(ii = 0; ii < size*size; ++ii) { double _dd = cblas_ddot(0, v1, size, v2, size); } free(v1); free(v2); }
Your C compiler sees that you're not using the result of cblas_ddot, so it doesn't even bother to call it. That loop never gets run. All your program does at runtime is call malloc and free twice, which is very fast :-)
C doesn't work like that :). functions always get called. but i did find a problem with my C code, i am incorrectly calling the dot production function:
See a recent article in lwn on pure and const functions to see how gcc is able to perform dead code elimination and CSE, provided its given annotations on the relevant functions. I'd certainly hope that your blas library is properly annotated!
#include
#include #include #include int main() { int size = 1024; int ii = 0; double dd = 0.0; double* v1 = malloc(sizeof(double) * (size)); double* v2 = malloc(sizeof(double) * (size)); for(ii = 0; ii < size; ++ii) { v1[ii] = 0.1; v2[ii] = 0.1; } for(ii = 0; ii < size*size; ++ii) { dd += cblas_ddot(size, v1, 0, v2, 0); } free(v1); free(v2); printf("%f\n", dd); return 0; }
time ./testdot 10737418.240187
real 0m2.200s user 0m2.190s sys 0m0.010s
So C is about twice as fast. I can live with that.
I suspect that it is your initialization that is the difference. For one thing, you've initialized the arrays to different values, and in your C code you've fused what are two separate loops in your Haskell code. So you've not only given the C compiler an easier loop to run (since you're initializing the array to a constant rather than to a sequence of numbers), but you've also manually optimized that initialization. In fact, this fusion could be precisely the factor of two. Why not see what happens in Haskell if you create just one vector and dot it with itself? (of course, that'll also make the blas call faster, so you'll need to be careful in your interpretation of your results.) David