[GHC] #14062: Pure syntax transformation affects performance.

#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed): {{{#!hs module Main where import Prelude as import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-} incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-} test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a
S.get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}} Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`). The problem is that `test1` runs **3 times faster** than `test2`. As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed):
{{{#!hs module Main where
import Prelude as import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity
repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-}
incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-}
test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a >> S.get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}}
Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`).
The problem is that `test1` runs **3 times faster** than `test2`.
As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied.
New description: Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed): {{{#!hs module Main where import Prelude as import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-} incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-} test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a
get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}} Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`). The problem is that `test1` runs **3 times faster** than `test2`. As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I can't reproduce this. I didn't use Criterion; I just ran the programs. Both produced the same code with ghc-8.0 and HEAD, and both ran equally fast. Interestingly, HEAD ran them 25% faster than 8.0. The Core looks the same, so I don't know where that improvement comes from, but I'm sure it'll make you happy. Can anyone else reproduce this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Old description:
Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed):
{{{#!hs module Main where
import Prelude as import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity
repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-}
incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-}
test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a >> get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}}
Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`).
The problem is that `test1` runs **3 times faster** than `test2`.
As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied.
New description: Hi! Let's consider the following code (compiled with `-O2`, `mtl` and `criterion` needed): {{{#!hs {-# LANGUAGE FlexibleContexts #-} module Main where import Prelude import Criterion.Main import Control.Monad.State.Strict import Control.Monad.Identity repeatM :: Monad m => m a -> Int -> m () repeatM f = go where go 0 = pure () go i = f >> go (i - 1) {-# INLINE repeatM #-} incState :: MonadState Int m => m () incState = modify' (1+) ; {-# INLINE incState #-} test1, test2 :: Int -> Int test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a
get)) n ; {-# INLINE test1 #-} test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}
main :: IO () main = do defaultMain [ bgroup "monad transformers overhead" [ bench "test1" $ nf test1 100000000 , bench "test2" $ nf test2 100000000 ] ] }}} Functions `test1` and `test2` differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for `test1` and `test2` is practically identical (there is an additional alias created for `test1`: `test1 = lvl1_rhor 'cast' ...`). The problem is that `test1` runs **3 times faster** than `test2`. As a side note - if we add more state transformers to `test1`, it optimizes them all away, while `test2` runs slower with each new transformer applied. -- Comment (by bgamari): I also can't reproduce this result with 8.2.1, {{{ benchmarking monad transformers overhead/test1 time 412.5 ms (378.3 ms .. 443.0 ms) 0.999 R² (0.997 R² .. 1.000 R²) mean 425.7 ms (418.6 ms .. 430.2 ms) std dev 6.749 ms (0.0 s .. 7.751 ms) variance introduced by outliers: 19% (moderately inflated) benchmarking monad transformers overhead/test2 time 373.2 ms (341.6 ms .. 532.9 ms) 0.970 R² (NaN R² .. 1.000 R²) mean 372.6 ms (351.4 ms .. 392.8 ms) std dev 34.07 ms (0.0 s .. 34.92 ms) variance introduced by outliers: 22% (moderately inflated) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): danilo2, can you still reproduce this with 8.4? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): I can't reproduce with 8.6.1 either: {{{ benchmarking monad transformers overhead/test1 time 94.84 ms (91.89 ms .. 97.81 ms) 0.998 R² (0.993 R² .. 1.000 R²) mean 97.52 ms (95.91 ms .. 100.6 ms) std dev 3.344 ms (1.380 ms .. 4.705 ms) benchmarking monad transformers overhead/test2 time 95.69 ms (94.49 ms .. 98.26 ms) 0.999 R² (0.996 R² .. 1.000 R²) mean 95.81 ms (95.24 ms .. 97.28 ms) std dev 1.375 ms (323.2 μs .. 2.151 ms) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14062: Pure syntax transformation affects performance. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: infoneeded Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by George): * status: new => infoneeded * failure: None/Unknown => Runtime performance bug -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC