RE: ghc more aggressive with Ints?

| > sumMinusOnce l = foldr sumMinus (0,0) l | > where sumMinus c (a,b) = (a+c, c-b) | > | > sumMinusTwice l = (foldr (+) 0 l, foldr (-) 0 l) | Here, we see that for everything but INT, the ONCE version is about two | times slower. For the other versions | (say DOUBLE), the ONCE version is consistently two times slower. This | raises the question as to why. The reason turns out to be that Int enumerations [1..100] are carefully expanded to build (enumFromToInt 1# 100#) where enumFromToInt is a hand-written unboxed version of enumFromTo. So we get list deforestation from the fold/build. (Incidentally, this may not happen if there are several calls to sumMinusOnce, or it's exported, because the fold will only see the build if sumMinusOnce is inlined.) The Double version uses the vanilla numericEnumFromTo (as in the Report) which does not expand to a 'build'. So no fold/build happens. By adding more code to the Prelude, one could 'fix' this, by making numericEnumFromTo have a build version. Whether this is worth the bother is not clear to me. If anyone is motivated to do it (needs understanding of how RULES work) by all means do so. Simon | | Looking at the -ddump-simpl output, we can see why. In the sumMinusOnce | version, when given an Int type signature, the tuple gets | unboxed and the primitive GHC.Prim.+# and GHC.Prim.-# functions are | used. In the one where we've given a Double type signature, we don't get | this. The tuple still gets unboxed, but the non-unboxed functions | GHC.Float.plusDouble and GHC.Float.minusDouble functions are used instead. | | Perhaps more importantly, the Double version is left as a recursive call, | looking something like: | | > Main.$wgo = \w -> | > case w of | > (y:ys) -> case Main.$wgo ys of | > (# ww1, ww2 #) -> | > (# plusDouble ww1 y, minusDouble y ww2) #) | > [] -> (# 0, 0 #) | | (converted by hand from simpl-core to a more Haskelly syntax) | | However, the Int version becomes a very tight loop in which the list is | deforested. | | My understanding was that using list comprehensions like [1..100000] | together with foldr you would always get build/fold deforestation, but | this only seems to happen when the list is a list of Ints, not a list of | Doubles. | | If we change the Once function to: | | > i2d :: Int -> Double | > i2d = fromInteger.toInteger | > | > sumMinusOnce :: [Int] -> (Double, Double) | > sumMinusOnce l = foldr sumMinus (0,0) l | > where sumMinus c (a,b) = (a+i2d c, (i2d c)-b) | | then we do get full deforestation as we want. This new version, when run | with [1..250000], yields a time of | | 4.12u 0.23s 0:04.44 | | which is actually faster than the TWICE/DOUBLE version. | | So...why is GHC more aggressive with Ints than with Doubles and what are | the additional conditions under which fold/build deforestation occurs? | | - Hal | | -- | Hal Daume III | | "Computer science is no more about computers | hdaume@isi.edu | than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume | | | | | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (1)
-
Simon Peyton-Jones