
#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed):
{{{#!hs module Main where
import Prelude as import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity
repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-}
incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-}
test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a >> S.get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}}
Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`).
The problem is that `test1` runs **3 times faster** than `test2`.
As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied.
New description: Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed): {{{#!hs module Main where import Prelude as import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-} incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-} test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a
get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}} Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`). The problem is that `test1` runs **3 times faster** than `test2`. As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler