
(a) People could point out to me where I'm still confused, as revealed by my code. Is it needlessly complicated?
looks pretty reasonable to me :) as to why Unique is in the IO monad is probabyl because if it were in any other monad, you could start the monad twice and thus get a repeat of each ID (if that makes sense). The only way around this is to put it in IO.

Hi, I'm measuring Haskell, Ocaml, and C, small array access performance and the results with Haskell (GHC 5.02.2) are so bad that I think I must be doing something wrong. Indeed, I am hardly a seasoned Haskell programmer. My results are: ghc -O 61.60s gcc -O3 0.20s Ocamlopt 1.11s (tail call) Ocamlopt 0.82s (for loop) Why is the ghc generated code so slow? The source for all the tests is below. Any insight will be greatly appreciated (I'm running on Debian Linux w/ 1GHz x86 CPU and 768MB RAM). The test cycles through an array of 16 elements, reading 100*10^6 elements. Thanks! Lex Haskell (ghc5) test source: import Array k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc 0 = 0 acc n = seq (k Array.! (n `mod` 16)) (acc (n-1)) main = do print (acc 100000000) C (gcc) test source: int main () { int i, k; int a [16]; for (i=0; i<=100000000; i++) { k = a [i % 16]; } } Ocaml test source (tail call): let a = Array.make 16 0;; let rec do1 i = if (i>100000000) then () else let k = a.(i mod 16) in do1 (i+1);; do1 0;; Ocaml test source (for loop): let a = Array.make 16 0;; for i=0 to 100000000 do let k = a.(i mod 16) in () done

Well, part of the answer is definitely that the Haskell program is the *only* one which really uses the array elements. :-) I guess that the compilers for the other languages simply remove the array access from the generated code (gcc definitely does, only an empty loop remains). Another reason is that the type signatures are missing, so Integer is used instead of Int, which is a bit unfair. Adding k :: Array Int Int acc :: Int -> Int cuts down the time by more than factor 5 on my machine. Cheers, S.

Great, thanks. Those suggestions narrow the gap from GHC -O being 330x slower than GCC -O3 to it being 20x slower. Here are the new results: gcc -O3 0.54s ocamlopt 1.11s ghc -O 10.76s ocamlc 14.10s GHC is still pretty slow for native x86 instruction code. Is there any way to further explain the performance gap ? (new code below) Thanks! Lex Haskell (GHC) source: import Array k :: Array Int Int k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int -> Int acc 0 = 0 acc n = seq (k Array.! (n `mod` 16)) (acc (n-1)) main = do print (acc 100000000) Caml test source: let a = Array.make 16 0;; let rec do1 i z = if (i>100000000) then z else do1 (i+1) (z + a.(i mod 16));; do1 0 0;; C (gcc) test source: int main () { int i, k = 0; int a [16]; for (i=0; i<100000000; i++) { k += a [i % 16]; } return (k); } On Fri, 27 Jun 2003, Sven Panne wrote:
Well, part of the answer is definitely that the Haskell program is the *only* one which really uses the array elements. :-) I guess that the compilers for the other languages simply remove the array access from the generated code (gcc definitely does, only an empty loop remains).
Another reason is that the type signatures are missing, so Integer is used instead of Int, which is a bit unfair. Adding
k :: Array Int Int acc :: Int -> Int
cuts down the time by more than factor 5 on my machine.
Cheers, S.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 26 June 2003 05:46 pm, Lex Stein wrote:
Great, thanks. Those suggestions narrow the gap from GHC -O being 330x slower than GCC -O3 to it being 20x slower. Here are the new results:
gcc -O3 0.54s ocamlopt 1.11s ghc -O 10.76s ocamlc 14.10s
GHC is still pretty slow for native x86 instruction code. Is there any way to further explain the performance gap ? (new code below)
I redid both programs to make them equivalent in action. The haskell program
was not actually summing the results, the C program was not checking array
bounds (Haskell does), the C program did not initialize the array and the C
program didn't print the result.
Times on my laptop (Crusoe 933):
62.061s : Haskell (Array -O2) Note: lot's 'o disk grinding!
18.231s : Haskell (UArray -O2)
18.108s : Haskell (UArray -O2 -fvia-c)
17.443s : Haskell (UArray -O2 -funfolding-update-in-place)
0.807s : C (-O3 without bound check)
1.127s : C (-O3 with bound check)
At best case for Haskell, 15.5 times slower. The thing about bounds checking,
in Haskell it's always there. In C, you might have it, you might not there is
no certainty by the language, only by design and implementation. So with C,
one is free to live dangerously.
I changed the "mod" value to 17, so that out of bound access takes place in
the Haskell program. It gives me the following:
Fail: Ix{Int}.index: Index (16) out of range ((0,15))
I did the same to the C program (without the bounds checking), it spits out a
value and give no indication that anything improper was done.
- ---------------------------------------------
Haskell Original (redone to be funtionally equivalent)
- ----------------------------------------------
import Array
a :: Array Int Int
a = array (0,15) [(i, i) | i <- [0 .. 15] ]
acc :: Int -> Int -> Int
acc s 0 = s
acc s n = acc (s + (a ! (n `mod` 16))) (n-1)
main :: IO ()
main = do
print $ acc 0 100000000
- -----------------------------------------------------
C Version with some modifications as well
- ----------------------------------------------------------
#include

On Fri, Jun 27, 2003 at 12:10:55PM -0500, Shawn P. Garbett wrote:
- -------------------------------------- New Haskell version with Unboxed Array Note:it was a simple change of import and type - ----------------------------------------- import Data.Array.Unboxed
a :: UArray Int Int a = array (0,15) [(i, i) | i <- [0 .. 15] ]
acc :: Int -> Int -> Int acc s 0 = s acc s n = acc (s + (a ! (n `mod` 16))) (n-1)
main :: IO () main = do print $ acc 0 100000000
I'd be curious to see timing on the following: import Data.Array.Unboxed import Data.Bits a :: UArray Int Int a = array (0,15) [(i, i) | i <- [0 .. 15] ] acc :: Int -> Int -> Int acc s 0 = s acc s n = acc (s + (a ! (n .&. 15))) (n-1) main :: IO () main = do print $ acc 0 100000000 All I've done eliminated the mod call in favor of a bitwise and. I would hope that the C compiler would do that, at least if it gives any improvement. -- David Roundy http://civet.berkeley.edu/droundy/

"Shawn P. Garbett"
I redid both programs to make them equivalent in action. The haskell program was not actually summing the results, the C program was not checking array bounds (Haskell does), the C program did not initialize the array and the C program didn't print the result.
Times on my laptop (Crusoe 933):
62.061s : Haskell (Array -O2) Note: lot's 'o disk grinding! 18.231s : Haskell (UArray -O2) 18.108s : Haskell (UArray -O2 -fvia-c) 17.443s : Haskell (UArray -O2 -funfolding-update-in-place) 0.807s : C (-O3 without bound check) 1.127s : C (-O3 with bound check)
At best case for Haskell, 15.5 times slower. The thing about bounds checking, in Haskell it's always there. In C, you might have it, you might not there is no certainty by the language, only by design and implementation. So with C, one is free to live dangerously.
The first two sections of http://www.cse.unsw.edu.au/~chak/papers/CK03.html also contain an analysis of some of the overheads in Haskell array codes compiled with GHC. Cheers, Manuel

On Thu, 26 Jun 2003 17:48:17 -0400 (EDT)
Lex Stein
Hi, I'm measuring Haskell, Ocaml, and C, small array access performance and the results with Haskell (GHC 5.02.2) are so bad that I think I must be doing something wrong. Indeed, I am hardly a seasoned Haskell programmer. My results are:
ghc -O 61.60s gcc -O3 0.20s Ocamlopt 1.11s (tail call) Ocamlopt 0.82s (for loop)
Why is the ghc generated code so slow? The source for all the tests is below. Any insight will be greatly appreciated (I'm running on Debian Linux w/ 1GHz x86 CPU and 768MB RAM).
The test cycles through an array of 16 elements, reading 100*10^6 elements.
Thanks! Lex
Haskell (ghc5) test source:
import Array k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc 0 = 0 acc n = seq (k Array.! (n `mod` 16)) (acc (n-1)) main = do print (acc 100000000)
C (gcc) test source:
int main () { int i, k; int a [16]; for (i=0; i<=100000000; i++) { k = a [i % 16]; } }
well, to begin with, it's obviously already biased toward C. The C version doesn't initialize the array and doesn't display any output. Using MinGW gcc 3.2 I, like Sven, get a .S file that's just an empty loop(why the loop was even kept...?). Before you make a benchmark like this you may want to look at at least the basic parts of the following, which has Sven's advice among (many) other things. http://haskell.cs.yale.edu/ghc/docs/latest/html/users_guide/faster.html Anyways, int main(void){ int i, k; int a [16]; for (i=0; i<=100000000; i++) { k = a [i % 16]; } printf("%d\n",k); return 0; } compiled with MinGW 3.2 -O/-O2/-O3 runs in 1.8s using time. int main(void){ int i, k; int a [16]; for (i=0; i<=100000000; i++) { k = a [i % 16]; } printf("%d\n",k); return 0; } The following (compiled with -O2 -fglasgow-exts and GHC 6.1 i.e. a CVS version) beats the C version taking only .8s. They don't do the same thing (the Haskell version "does" more, but prints something different). Looking at the core/stg/c/asm (the asm and C was quite messy compared with other times), it certainly doesn't appear to be optimizing this to print 0, and it does appear (though I'm less sure here) to be doing everything as it should, but it's really hard to read the assembly, since we don't use the value we lookup I can easily see gcc dropping it altogether, though the C GHC spits out is pretty unusual for gcc, so it may not. module Main (main) where import Data.Array.Unboxed (UArray, array) import Data.Array.Base (unsafeAt) -- if C doesn't, why should we? import GHC.Exts k :: UArray Int Int k = array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int# -> Int# acc 0# = 0# acc n = (k `unsafeAt` (I# (n `remInt#` 16#))) `seq` acc (n -# 1#) main :: IO () main = print (I# (acc 100000000#)) However, I'm am somewhat surprised that I seemingly had to unbox by hand to get quite close (in fact, better) than C. I expected the following version to be comparable to the C version (and it is orders of magnitude faster than the original version which was over 5min when I killed it (though I think I compiled it with just -O), this version taking 14s). module Main (main) where import Data.Array.Unboxed (UArray, array) import Data.Array.Base (unsafeAt) -- if C doesn't, why should we? k :: UArray Int Int k = array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int -> Int acc 0 = 0 acc n = (k `unsafeAt` (n `mod` 16)) `seq` acc (n - 1) main :: IO () main = print (acc 100000000)
participants (7)
-
David Roundy
-
Derek Elkins
-
Hal Daume
-
Lex Stein
-
Manuel M T Chakravarty
-
Shawn P. Garbett
-
Sven Panne