
Is there a difference between "sum $ map f xs" and "sum $! map f xs"? > > I would think no, since "$!" just evaluates the argument to WHNF, > so it would evaluate just the first cons of the list. > > > This little
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 If you look at the memory usage, the version without the bang actually contains a space leak, so that's the source of the longer runtime. I hope you do not mind some core from ghc ; ) With the bang, we have the following main loop: - -- strict acculumator function over a list, good (worker from sum) Rec { $wgo :: [Int] -> Int# -> Int# $wgo = \ (w :: [Int]) (ww :: Int#) -> case w of _ { [] -> ww; : y ys -> case y of _ { I# y1 -> $wgo ys (+# ww y1) } } end Rec } - -- builds a list, lazy (i.e. no space leak) main3 :: Int -> [Int] -> [Int] main3 = \ (x :: Int) (ys :: [Int]) -> : (bitcount x) ys main2 :: String main2 = case $w$s^ main5 main4 of ww { __DEFAULT -> -- calculate 2^24 case efdtIntUpFB main3 ([]) 0 4 (-# ww 1) of vx { __DEFAULT -> -- loop function, desugared from [a, b..c] case $wgo vx 0 of ww1 { __DEFAULT -> -- sum up the list case $wshowSignedInt 0 ww1 ([]) of _ { (# ww5, ww6 #) -> -- only shows the result, not important : ww5 ww6 } } } } So we build a lazy list with bitcount applied to all arguments. All the stuff is boxed for some reason, so we allocate some stuff, just to throw it away, but it is never held on for long, so our memory usage never gets past 1MB (and thus GC stays cheap). This list is then deconstructed by $wgo again, forcing one element at the time and accumulated into a Int#. Without the bang, the main loops looks something like this: - -- This is interesting. The first argument seems to be the number from the list and the third is our accumulator while the second seems to start as the identity function (from main2 second case). main4 :: Int -> (Int -> Int) -> Int -> Int main4 = \ (x :: Int) (ys :: Int -> Int) (tpl :: Int) -> ys (case tpl of _ { I# x1 -> case x of _ { I# ww1 -> case $wbitcount ww1 of ww2 { __DEFAULT -> I# (+# x1 ww2) } } }) main2 :: String main2 = case $w$s^ main6 main5 of ww { __DEFAULT -> -- calculate 2^24 case efdtIntUpFB main4 (id) 0 4 (-# ww 1) main3 of _ { I# ww3 -> -- loop function, desugared from [a, b..c], but instead of building a list we do something like a difference-list, but with ints (note that we now have 7 arguments instead of 6) case $wshowSignedInt 0 ww3 ([]) of _ { (# ww5, ww6 #) -> -- only shows the result, not important : ww5 ww6 } } } efdtIntUpFB seems to be something like this: efdtIntUpFB :: (Int -> acc -> acc) -> acc -> Int# -> Int# -> Int# -> acc efdtIntUpFB f acc x step max = if x > max then acc else f x (efdtIntUpFB f acc (x + step) step max) (Actual implementation: http://hackage.haskell.org/package/base-4.8.1.0/docs/src/GHC.Enum.html#efdtI...) But acc now resoves to Int -> Int. So if we exand this, we get: main4 0 (main4 4 (main4 8 ... (main4 ... id) ...))) And main4 builds a closure and applies this to ys. Then we enter ys, which is again main4, building a closure and apply this to ys, which is again main4, which builds a closure and finally applies this to id (after 2^22 closures or something like that). So we leak a few hundered MB (423 on my machine) which we have to copy every time we do a (full) GC. On my machine 0.5 sec are spend on full GC, another 0.5 on a few more minor GCs and 1 sec actually computing, which is somewhat similar to your results. So, that much about the space leak. The reason for the difference seems to manifest itself after the first float-out pass, somehow the simplifier rebuilds the expression differently... but right now I do not see what exactly is happening there and why. My first though was a different set of rules from the list fusion stuff fires, but at least the names and order of the rule firings are exactly the same... But the more I think about it, the more I think this is a bug. The following program leaks a similar amount of space: main = print bar bar :: Int bar = sum $ map (\x -> x) [1,2..2^24-1] There is a special rule for map id, so we avoid it, a function doing actual work (like bitcount) would yield the same result. Using [1..2^24-1] yields an efficient program, so something with the combination of sum, map and enumFromThenTo seems to be broken. So yes, I would argue that you indeed did hit a bug. On 12/14/2015 11:30 PM, Johannes Waldmann wrote: program > > main = print $ sum $ map bitcount [0, 4 .. 2^24-1 ] > > bitcount :: Int -> Int > bitcount x = > if x > 0 then let (d,m) = divMod x 2 in bitcount d + m else 0 > > runs in 1.6 seconds when compiled with -O2 > on ghc-7.6.3, ghc-7.8.4, > > but takes 3.2 seconds on ghc-7.10.[1-3]. Why? > > > when I change the main function (note: $ => $!) to > > main = print $ sum $! map bitcount [0, 4 .. 2^24-1 ] > > and compile with 7.10.[1-3], then it also runs in 1.6 seconds. > > > (I can write a bug report but I want to check > whether I am missing something obvious) > > > - J.W. > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJWb1y3AAoJEM0PYZBmfhoBqNAH/1QNNvZ84We9b0RUtb6w5gNt 6u9yzw5DxMBsxdHgEWlbc7vXOLpeCXaPhLMKYVFnNDHwwx4c+CPWZd44YUCZtLOd Xi9sCAHMJX5vDxfhFgCzwBP39KsMr+Euhde+thIBjlAmM/IKJLC1h1tjtoYUOMle W9AMzIoVetHSWA5MqgX+LjpNiMifTnXNy+6yx8aytCtRKo2hiGs9tWSzCE58a8XM EY3cFSzbqy1IHsq9xwycjfpTvpUt1Ga8jjmhwilhjdREIYwWmc6qT2vAiAGf4nPn gx9SoMIlAmgbcV/xMSUha2YZ5QzG9deYTIZ7SYLfcaTev1/QMKmONgHX4uWTRFU= =rHVH -----END PGP SIGNATURE-----