
Several comments 1. In Haskell "mod" is a lot more expensive than "rem", because it involves careful jiggery pokery with the sign of the result. That's why your boxed version was slower (nothing to do with boxing). 2. GHC does indeed optimise away the array access if you aren't careful, and does in Derek's code. Below is a version that avoids doing so, in the same way as the C version, by returning one of the values. 3. Derek correctly points out that gcc does not do array-bound checks. GHC should eliminate them, but it's a non-trivial analysis and GHC does not currently attempt it. You can always use unsafeAt, as Derek does. 4. The UArray stuff builds an array of unboxed Ints, like C. If you use ordinary boxed arrays, the loop has to unbox the Int every time around the loop, which makes it slower (2.6s) With that done, we get ghc -O2 -fliberate-case-threshold20 1.8s gcc -O2 1.4s Seems close enough to me; have not investigated further. The "liberate-case" thing is a practically-undocumented GHC optimisation that switches on with -O2. It specialises a loop that unboxes a free variable (here the array k) so that the unboxing doesn't happen each time round the loop. The threshold at which GHC thinks there is too much code dup is set very low by default, so the flag increases it. Simon ====================== k :: UArray Int Int k = array (0,15) [(i, i+1) | i <- [0 .. 15] ] acc :: Int -> Int -> Int acc r 0 = r acc r n = r `seq` acc (k `unsafeAt` (n `rem` 16)) (n - 1) main :: IO () main = print (acc 0 100000000) ====================== | Anyways, | int main(void){ | int i, k; | int a [16]; | for (i=0; i<=100000000; i++) { | k = a [i % 16]; | } | printf("%d\n",k); | return 0; | } | compiled with MinGW 3.2 -O/-O2/-O3 runs in 1.8s using time. | | int main(void){ | int i, k; | int a [16]; | for (i=0; i<=100000000; i++) { | k = a [i % 16]; | } | printf("%d\n",k); | return 0; | } | | The following (compiled with -O2 -fglasgow-exts and GHC 6.1 i.e. a CVS | version) beats the C version taking only .8s. They don't do the same | thing (the Haskell version "does" more, but prints something different). | Looking at the core/stg/c/asm (the asm and C was quite messy compared | with other times), it certainly doesn't appear to be optimizing this to | print 0, and it does appear (though I'm less sure here) to be doing | everything as it should, but it's really hard to read the assembly, | since we don't use the value we lookup I can easily see gcc dropping it | altogether, though the C GHC spits out is pretty unusual for gcc, so it | may not. | | module Main (main) where | import Data.Array.Unboxed (UArray, array) | import Data.Array.Base (unsafeAt) -- if C doesn't, why should we? | import GHC.Exts | | k :: UArray Int Int | k = array (0,15) [(i, i+1) | i <- [0 .. 15] ] | | acc :: Int# -> Int# | acc 0# = 0# | acc n = (k `unsafeAt` (I# (n `remInt#` 16#))) `seq` acc (n -# 1#) | | main :: IO () | main = print (I# (acc 100000000#)) | | However, I'm am somewhat surprised that I seemingly had to unbox by hand | to get quite close (in fact, better) than C. I expected the following | version to be comparable to the C version (and it is orders of magnitude | faster than the original version which was over 5min when I killed it | (though I think I compiled it with just -O), this version taking 14s). | | module Main (main) where | import Data.Array.Unboxed (UArray, array) | import Data.Array.Base (unsafeAt) -- if C doesn't, why should we? | | k :: UArray Int Int | k = array (0,15) [(i, i+1) | i <- [0 .. 15] ] | | acc :: Int -> Int | acc 0 = 0 | acc n = (k `unsafeAt` (n `mod` 16)) `seq` acc (n - 1) | | main :: IO () | main = print (acc 100000000) | | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, 27 Jun 2003 09:36:36 +0100
"Simon Peyton-Jones"
Several comments
1. In Haskell "mod" is a lot more expensive than "rem", because it involves careful jiggery pokery with the sign of the result. That's why your boxed version was slower (nothing to do with boxing).
Ah. I guess I should've checked the core closer (or more than a quick glimpse) before making wild assumptions. The (relatively) few other times I've unboxed by hand, I've found GHC had already done it. I did think mod might have been a problem, but I didn't expect it to be -that- much of a difference, hence assuming it was boxing related (I didn't see a modInt# so I figured remInt# was more or less it, it didn't know how much less).
2. GHC does indeed optimise away the array access if you aren't careful, and does in Derek's code. Below is a version that avoids doing so, in the same way as the C version, by returning one of the values.
Yeah, I was having a very hard time believing it, as the code was comparable to gcc optimising out the array access (which took ~.6 seconds on my computer).
3. Derek correctly points out that gcc does not do array-bound checks. GHC should eliminate them, but it's a non-trivial analysis and GHC does not currently attempt it. You can always use unsafeAt, as Derek does.
I was thinking that that would be nice, but was not surprised that it wasn't there. Are there any concoctions of flags/pragmas/techniques to coax GHC into unrolling a function (or rather an IO computation in this case) a couple of iterations. There was another (admittedly also noddy) program that I could get to near gcc, but only by unrolling by hand (though Ian's TH stuff likely could have done it).
participants (2)
-
Derek Elkins
-
Simon Peyton-Jones