
#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Old description:
Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed):
{{{#!hs module Main where
import Prelude as import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity
repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-}
incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-}
test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a >> get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}}
Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`).
The problem is that `test1` runs **3 times faster** than `test2`.
As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied.
New description: Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed): {{{#!hs {-# LANGUAGE FlexibleContexts #-} module Main where import Prelude import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-} incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-} test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a
get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}} Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`). The problem is that `test1` runs **3 times faster** than `test2`. As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied. -- Comment (by bgamari): I also can't reproduce this result with 8.2.1, {{{ benchmarking monad transformers overhead/test1 time 412.5 ms (378.3 ms .. 443.0 ms) 0.999 R² (0.997 R² .. 1.000 R²) mean 425.7 ms (418.6 ms .. 430.2 ms) std dev 6.749 ms (0.0 s .. 7.751 ms) variance introduced by outliers: 19% (moderately inflated) benchmarking monad transformers overhead/test2 time 373.2 ms (341.6 ms .. 532.9 ms) 0.970 R² (NaN R² .. 1.000 R²) mean 372.6 ms (351.4 ms .. 392.8 ms) std dev 34.07 ms (0.0 s .. 34.92 ms) variance introduced by outliers: 22% (moderately inflated) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler