
Hello Bulat, Wednesday, January 18, 2006, 8:34:54 PM, you wrote: BZ> the only cause that this code is only 3 times slower is that C version BZ> is really limited by memory speed. when tested on 1000-element BZ> arrays, it is 20 times slower. i'm not yet tried SSE optimization for BZ> gcc ;) sorry, with the "gcc -O3 -ffast-math -fstrict-aliasing -funroll-loops" the C version is 50 times faster than best Haskell one... it's the loop from C version: L18: fldl (%edx) faddl (%ecx) fstpl (%edx) fldl 8(%edx) faddl 8(%ecx) fstpl 8(%edx) fldl 16(%edx) faddl 16(%ecx) fstpl 16(%edx) fldl 24(%edx) faddl 24(%ecx) addl $4,%ebx addl $32,%ecx fstpl 24(%edx) addl $32,%edx cmpl -4(%ebp),%ebx jl L18 -- Best regards, Bulat mailto:bulatz@HotPOP.com