
On Thu, 26 Jun 2003 17:48:17 -0400 (EDT)
Lex Stein
Hi, I'm measuring Haskell, Ocaml, and C, small array access performance and the results with Haskell (GHC 5.02.2) are so bad that I think I must be doing something wrong. Indeed, I am hardly a seasoned Haskell programmer. My results are:
ghc -O 61.60s gcc -O3 0.20s Ocamlopt 1.11s (tail call) Ocamlopt 0.82s (for loop)
Why is the ghc generated code so slow? The source for all the tests is below. Any insight will be greatly appreciated (I'm running on Debian Linux w/ 1GHz x86 CPU and 768MB RAM).
The test cycles through an array of 16 elements, reading 100*10^6 elements.
Thanks! Lex
Haskell (ghc5) test source:
import Array k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc 0 = 0 acc n = seq (k Array.! (n `mod` 16)) (acc (n-1)) main = do print (acc 100000000)
C (gcc) test source:
int main () { int i, k; int a [16]; for (i=0; i<=100000000; i++) { k = a [i % 16]; } }
well, to begin with, it's obviously already biased toward C. The C version doesn't initialize the array and doesn't display any output. Using MinGW gcc 3.2 I, like Sven, get a .S file that's just an empty loop(why the loop was even kept...?). Before you make a benchmark like this you may want to look at at least the basic parts of the following, which has Sven's advice among (many) other things. http://haskell.cs.yale.edu/ghc/docs/latest/html/users_guide/faster.html Anyways, int main(void){ int i, k; int a [16]; for (i=0; i<=100000000; i++) { k = a [i % 16]; } printf("%d\n",k); return 0; } compiled with MinGW 3.2 -O/-O2/-O3 runs in 1.8s using time. int main(void){ int i, k; int a [16]; for (i=0; i<=100000000; i++) { k = a [i % 16]; } printf("%d\n",k); return 0; } The following (compiled with -O2 -fglasgow-exts and GHC 6.1 i.e. a CVS version) beats the C version taking only .8s. They don't do the same thing (the Haskell version "does" more, but prints something different). Looking at the core/stg/c/asm (the asm and C was quite messy compared with other times), it certainly doesn't appear to be optimizing this to print 0, and it does appear (though I'm less sure here) to be doing everything as it should, but it's really hard to read the assembly, since we don't use the value we lookup I can easily see gcc dropping it altogether, though the C GHC spits out is pretty unusual for gcc, so it may not. module Main (main) where import Data.Array.Unboxed (UArray, array) import Data.Array.Base (unsafeAt) -- if C doesn't, why should we? import GHC.Exts k :: UArray Int Int k = array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int# -> Int# acc 0# = 0# acc n = (k `unsafeAt` (I# (n `remInt#` 16#))) `seq` acc (n -# 1#) main :: IO () main = print (I# (acc 100000000#)) However, I'm am somewhat surprised that I seemingly had to unbox by hand to get quite close (in fact, better) than C. I expected the following version to be comparable to the C version (and it is orders of magnitude faster than the original version which was over 5min when I killed it (though I think I compiled it with just -O), this version taking 14s). module Main (main) where import Data.Array.Unboxed (UArray, array) import Data.Array.Base (unsafeAt) -- if C doesn't, why should we? k :: UArray Int Int k = array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int -> Int acc 0 = 0 acc n = (k `unsafeAt` (n `mod` 16)) `seq` acc (n - 1) main :: IO () main = print (acc 100000000)