
#11795: Performance issues with replicateM_ -------------------------------------+------------------------------------- Reporter: snoyberg | Owner: snoyberg Type: bug | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 7.10.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2086 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): On Phab:D2086 Simon posed the very reasonable question of //why// these two differ so significantly in performance. In my original reading of the patch I had assumed that the reason was that the "fast" version would enable unboxing of the `Int` accumulator while the "slow" version would not. Looking at the Core, however, this does not appear to be the difference: this accumulator is unboxed in both cases. The actual difference is that the self-recursive nature of the slow version inhibits inlining, which means we never get to specialise to the action. Instead we end up with, {{{#!hs $w$sreplicateM_noWW :: forall a. Int# -> IO a -> State# RealWorld -> (# State# RealWorld, () #) $w$sreplicateM_noWW = \ (@ a) (ww :: Int#) (w :: IO a) (w1 :: State# RealWorld) -> case ww of ds { __DEFAULT -> case (w `cast` ...) w1 of _ { (# ipv, ipv1 #) -> $w$sreplicateM_noWW @ a (-# ds 1#) w ipv }; 0# -> (# w1, () #) } lvl2 :: Ptr CChar -> State# RealWorld -> (# State# RealWorld, () #) lvl2 = \ (x :: Ptr CChar) (eta :: State# RealWorld) -> $w$sreplicateM_noWW @ CChar 100000000# (($fStorableInt21 (x `cast` ...)) `cast` ...) eta }}} Which gives us O(n) slow calls. This stands in stark contrast to the fast variant, which compiles down to this beautiful allocation-free loop, {{{#!hs $wgo12 :: Int# -> State# RealWorld -> (# State# RealWorld, () #) $wgo12 = \ (ww :: Int#) (w :: State# RealWorld) -> case tagToEnum# @ Bool (<=# ww 0#) of _ { False -> case getForeignEncoding1 of _ { (getForeignEncoding5, setForeignEncoding1) -> case (getForeignEncoding5 `cast` ...) w of _ { (# ipv, ipv1 #) -> case charIsRepresentable3 @ () ipv1 lvl1 (lvl2 `cast` ...) ipv of _ { (# ipv2, ipv3 #) -> case seq# @ () @ RealWorld ipv3 ipv2 of _ { (# ipv4, ipv5 #) -> $wgo12 (-# ww 1#) ipv4 } } } }; True -> (# w, () #) } }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11795#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler