Re[3]: GHC vs. GCC on raw vector addition

18 Jan 2006

      Hello Bulat,

Wednesday, January 18, 2006, 8:34:54 PM, you wrote:

BZ> the only cause that this code is only 3 times slower is that C version
BZ> is really limited by memory speed. when tested on 1000-element
BZ> arrays, it is 20 times slower. i'm not yet tried SSE optimization for
BZ> gcc ;)

sorry, with the "gcc -O3 -ffast-math -fstrict-aliasing -funroll-loops"
the C version is 50 times faster than best Haskell one... it's the
loop from C version:

L18:
        fldl (%edx)
        faddl (%ecx)
        fstpl (%edx)
        fldl 8(%edx)
        faddl 8(%ecx)
        fstpl 8(%edx)
        fldl 16(%edx)
        faddl 16(%ecx)
        fstpl 16(%edx)
        fldl 24(%edx)
        faddl 24(%ecx)
        addl $4,%ebx
        addl $32,%ecx
        fstpl 24(%edx)
        addl $32,%edx
        cmpl -4(%ebp),%ebx
        jl L18

-- 
Best regards,
 Bulat                            mailto:bulatz@HotPOP.com

Re[3]: GHC vs. GCC on raw vector addition

Bulat Ziganshin