
This has been hinted to before: gcc gets rid of the loop. bash-2.05b$ gcc -O3 arr.c -S -o arr.s bash-2.05b$ cat arr.s .file "arr.c" .def ___main; .scl 2; .type 32; .endef .text .align 2 .align 16 .globl _main .def _main; .scl 2; .type 32; .endef _main: pushl %ebp movl %esp, %ebp subl $72, %esp xorl %eax, %eax andl $-16, %esp call __alloca call ___main xorl %eax, %eax .align 16 L7: incl %eax cmpl $100000000, %eax jle L7 leave ret the inside of the loop is exactly: L7: incl %eax cmpl $100000000, %eax jle L7 which doesn't read the array! essentially you are being unfare to ghc by making it seq the array element, but not doing the same to GCC. even with -O0, gcc gets rid of the loop body. one solution: add in the assembly to actually read the array (i'm too lazy and busy to do this). - hal -- Hal Daume III | hdaume@isi.edu "Arrest this man, he talks in maths." | www.isi.edu/~hdaume
-----Original Message----- From: haskell-cafe-admin@haskell.org [mailto:haskell-cafe-admin@haskell.org] On Behalf Of Lex Stein Sent: Thursday, June 26, 2003 3:46 PM To: Sven Panne Cc: haskell-cafe@haskell.org Subject: Re: haskell array access
Great, thanks. Those suggestions narrow the gap from GHC -O being 330x slower than GCC -O3 to it being 20x slower. Here are the new results:
gcc -O3 0.54s ocamlopt 1.11s ghc -O 10.76s ocamlc 14.10s
GHC is still pretty slow for native x86 instruction code. Is there any way to further explain the performance gap ? (new code below)
Thanks! Lex
Haskell (GHC) source:
import Array k :: Array Int Int k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int -> Int acc 0 = 0 acc n = seq (k Array.! (n `mod` 16)) (acc (n-1)) main = do print (acc 100000000)
Caml test source:
let a = Array.make 16 0;; let rec do1 i z = if (i>100000000) then z else do1 (i+1) (z + a.(i mod 16));; do1 0 0;;
C (gcc) test source:
int main () { int i, k = 0; int a [16]; for (i=0; i<100000000; i++) { k += a [i % 16]; } return (k); }
On Fri, 27 Jun 2003, Sven Panne wrote:
Well, part of the answer is definitely that the Haskell program is the *only* one which really uses the array elements. :-) I guess that the compilers for the other languages simply remove the array access from the generated code (gcc definitely does, only an empty loop remains).
Another reason is that the type signatures are missing, so Integer is used instead of Int, which is a bit unfair. Adding
k :: Array Int Int acc :: Int -> Int
cuts down the time by more than factor 5 on my machine.
Cheers, S.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

You must be using my old example, with gcc -O3 my new example does access the array and is 20x faster than the ghc -O code. Scroll down for "-->" that I inserted beside the instruction to see the array access. .file "arrayloop.c" .version "01.01" gcc2_compiled.: .text .p2align 2,0x90 .globl main .type main,@function main: pushl %ebp movl %esp,%ebp subl $80,%esp pushl %esi pushl %ebx xorl %ebx,%ebx xorl %ecx,%ecx leal -64(%ebp),%esi .p2align 2,0x90 .L6: movl %ecx,%edx testl %ecx,%ecx jge .L7 leal 15(%ecx),%edx .L7: andl $-16,%edx movl %ecx,%eax subl %edx,%eax --> addl (%esi,%eax,4),%ebx incl %ecx cmpl $99999999,%ecx jle .L6 movl %ebx,%eax popl %ebx popl %esi leave ret .Lfe1: .size main,.Lfe1-main .ident "GCC: (GNU) c 2.95.4 20020320 [FreeBSD]" On Thu, 26 Jun 2003, Hal Daume wrote:
This has been hinted to before: gcc gets rid of the loop.
bash-2.05b$ gcc -O3 arr.c -S -o arr.s bash-2.05b$ cat arr.s .file "arr.c" .def ___main; .scl 2; .type 32; .endef .text .align 2 .align 16 .globl _main .def _main; .scl 2; .type 32; .endef _main: pushl %ebp movl %esp, %ebp subl $72, %esp xorl %eax, %eax andl $-16, %esp call __alloca call ___main xorl %eax, %eax .align 16 L7: incl %eax cmpl $100000000, %eax jle L7 leave ret
the inside of the loop is exactly:
L7: incl %eax cmpl $100000000, %eax jle L7
which doesn't read the array! essentially you are being unfare to ghc by making it seq the array element, but not doing the same to GCC. even with -O0, gcc gets rid of the loop body.
one solution: add in the assembly to actually read the array (i'm too lazy and busy to do this).
- hal
-- Hal Daume III | hdaume@isi.edu "Arrest this man, he talks in maths." | www.isi.edu/~hdaume
-----Original Message----- From: haskell-cafe-admin@haskell.org [mailto:haskell-cafe-admin@haskell.org] On Behalf Of Lex Stein Sent: Thursday, June 26, 2003 3:46 PM To: Sven Panne Cc: haskell-cafe@haskell.org Subject: Re: haskell array access
Great, thanks. Those suggestions narrow the gap from GHC -O being 330x slower than GCC -O3 to it being 20x slower. Here are the new results:
gcc -O3 0.54s ocamlopt 1.11s ghc -O 10.76s ocamlc 14.10s
GHC is still pretty slow for native x86 instruction code. Is there any way to further explain the performance gap ? (new code below)
Thanks! Lex
Haskell (GHC) source:
import Array k :: Array Int Int k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int -> Int acc 0 = 0 acc n = seq (k Array.! (n `mod` 16)) (acc (n-1)) main = do print (acc 100000000)
Caml test source:
let a = Array.make 16 0;; let rec do1 i z = if (i>100000000) then z else do1 (i+1) (z + a.(i mod 16));; do1 0 0;;
C (gcc) test source:
int main () { int i, k = 0; int a [16]; for (i=0; i<100000000; i++) { k += a [i % 16]; } return (k); }
On Fri, 27 Jun 2003, Sven Panne wrote:
Well, part of the answer is definitely that the Haskell program is the *only* one which really uses the array elements. :-) I guess that the compilers for the other languages simply remove the array access from the generated code (gcc definitely does, only an empty loop remains).
Another reason is that the type signatures are missing, so Integer is used instead of Int, which is a bit unfair. Adding
k :: Array Int Int acc :: Int -> Int
cuts down the time by more than factor 5 on my machine.
Cheers, S.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Actually, to be completely correct, the access is the movl 2 instructions above the "-->". The "-->" points to the accumulation in the local variable k. Any ideas how I can further narrow this 20x gap and improve GHC's observed relative array access performance ? int main () { int i, k = 0; int a [16]; for (i=0; i<100000000; i++) { k += a [i % 16]; } return k; } Thanks! Lex -- Lex Stein http://www.eecs.harvard.edu/~stein/ stein@eecs.harvard.edu TEL: 617-233-0246 On Thu, 26 Jun 2003, Lex Stein wrote:
You must be using my old example, with gcc -O3 my new example does access the array and is 20x faster than the ghc -O code. Scroll down for "-->" that I inserted beside the instruction to see the array access.
.file "arrayloop.c" .version "01.01" gcc2_compiled.: .text .p2align 2,0x90 .globl main .type main,@function main: pushl %ebp movl %esp,%ebp subl $80,%esp pushl %esi pushl %ebx xorl %ebx,%ebx xorl %ecx,%ecx leal -64(%ebp),%esi .p2align 2,0x90 .L6: movl %ecx,%edx testl %ecx,%ecx jge .L7 leal 15(%ecx),%edx .L7: andl $-16,%edx movl %ecx,%eax subl %edx,%eax --> addl (%esi,%eax,4),%ebx incl %ecx cmpl $99999999,%ecx jle .L6 movl %ebx,%eax popl %ebx popl %esi leave ret .Lfe1: .size main,.Lfe1-main .ident "GCC: (GNU) c 2.95.4 20020320 [FreeBSD]"
On Thu, 26 Jun 2003, Hal Daume wrote:
This has been hinted to before: gcc gets rid of the loop.
bash-2.05b$ gcc -O3 arr.c -S -o arr.s bash-2.05b$ cat arr.s .file "arr.c" .def ___main; .scl 2; .type 32; .endef .text .align 2 .align 16 .globl _main .def _main; .scl 2; .type 32; .endef _main: pushl %ebp movl %esp, %ebp subl $72, %esp xorl %eax, %eax andl $-16, %esp call __alloca call ___main xorl %eax, %eax .align 16 L7: incl %eax cmpl $100000000, %eax jle L7 leave ret
the inside of the loop is exactly:
L7: incl %eax cmpl $100000000, %eax jle L7
which doesn't read the array! essentially you are being unfare to ghc by making it seq the array element, but not doing the same to GCC. even with -O0, gcc gets rid of the loop body.
one solution: add in the assembly to actually read the array (i'm too lazy and busy to do this).
- hal
-- Hal Daume III | hdaume@isi.edu "Arrest this man, he talks in maths." | www.isi.edu/~hdaume
-----Original Message----- From: haskell-cafe-admin@haskell.org [mailto:haskell-cafe-admin@haskell.org] On Behalf Of Lex Stein Sent: Thursday, June 26, 2003 3:46 PM To: Sven Panne Cc: haskell-cafe@haskell.org Subject: Re: haskell array access
Great, thanks. Those suggestions narrow the gap from GHC -O being 330x slower than GCC -O3 to it being 20x slower. Here are the new results:
gcc -O3 0.54s ocamlopt 1.11s ghc -O 10.76s ocamlc 14.10s
GHC is still pretty slow for native x86 instruction code. Is there any way to further explain the performance gap ? (new code below)
Thanks! Lex
Haskell (GHC) source:
import Array k :: Array Int Int k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int -> Int acc 0 = 0 acc n = seq (k Array.! (n `mod` 16)) (acc (n-1)) main = do print (acc 100000000)
Caml test source:
let a = Array.make 16 0;; let rec do1 i z = if (i>100000000) then z else do1 (i+1) (z + a.(i mod 16));; do1 0 0;;
C (gcc) test source:
int main () { int i, k = 0; int a [16]; for (i=0; i<100000000; i++) { k += a [i % 16]; } return (k); }
On Fri, 27 Jun 2003, Sven Panne wrote:
Well, part of the answer is definitely that the Haskell program is the *only* one which really uses the array elements. :-) I guess that the compilers for the other languages simply remove the array access from the generated code (gcc definitely does, only an empty loop remains).
Another reason is that the type signatures are missing, so Integer is used instead of Int, which is a bit unfair. Adding
k :: Array Int Int acc :: Int -> Int
cuts down the time by more than factor 5 on my machine.
Cheers, S.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (2)
-
Hal Daume
-
Lex Stein