Re: [GHC] #14062: Pure syntax transformation affects performance.

30 Jul 2017

      #14062: Pure syntax transformation affects performance.
-------------------------------------+-------------------------------------
        Reporter:  danilo2           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  high              |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by danilo2:

Old description:
...
Hi! Let's consider the following code (compiled with `-O2`, `mtl` and
`criterion` needed):
{{{#!hs
module Main where
import Prelude as
import Criterion.Main
import Control.Monad.State.Strict
import Control.Monad.Identity
repeatM :: Monad m => m a -> Int -> m ()
repeatM f = go where
    go 0 = pure ()
    go i = f >> go (i - 1)
{-# INLINE repeatM #-}
incState :: MonadState Int m => m ()
incState = modify' (1+) ; {-# INLINE incState #-}
test1, test2 :: Int -> Int
test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState
a >> S.get)) n ; {-# INLINE test1 #-}
test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get
; {-# INLINE test2 #-}
main :: IO ()
main = do
    defaultMain
        [ bgroup "monad transformers overhead"
            [ bench "test1"     $ nf test1 100000000
            , bench "test2"     $ nf test2 100000000
            ]
        ]
}}}
Functions `test1` and `test2` differ only syntactically and this
difference should not affect GHC's inliner, because their implementations
use fully saturated calls. The generated core for `test1` and `test2` is
practically identical (there is an additional alias created for `test1`:
`test1 = lvl1_rhor 'cast' ...`).
The problem is that `test1` runs **3 times faster** than `test2`.
As a side note - if we add more state transformers to `test1`, it
optimizes them all away, while `test2` runs slower with each new
transformer applied.
New description:

 Hi! Let's consider the following code (compiled with `-O2`, `mtl` and
 `criterion` needed):

 {{{#!hs
 module Main where

 import Prelude as
 import Criterion.Main
 import Control.Monad.State.Strict
 import Control.Monad.Identity

 repeatM :: Monad m => m a -> Int -> m ()
 repeatM f = go where
     go 0 = pure ()
     go i = f >> go (i - 1)
 {-# INLINE repeatM #-}

 incState :: MonadState Int m => m ()
 incState = modify' (1+) ; {-# INLINE incState #-}

 test1, test2 :: Int -> Int
 test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a
...
...
get)) n ; {-# INLINE test1 #-}
 test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get
 ; {-# INLINE test2 #-}
main :: IO ()
 main = do
     defaultMain
         [ bgroup "monad transformers overhead"
             [ bench "test1"     $ nf test1 100000000
             , bench "test2"     $ nf test2 100000000
             ]
         ]
 }}}

 Functions `test1` and `test2` differ only syntactically and this
 difference should not affect GHC's inliner, because their implementations
 use fully saturated calls. The generated core for `test1` and `test2` is
 practically identical (there is an additional alias created for `test1`:
 `test1 = lvl1_rhor 'cast' ...`).

 The problem is that `test1` runs **3 times faster** than `test2`.

 As a side note - if we add more state transformers to `test1`, it
 optimizes them all away, while `test2` runs slower with each new
 transformer applied.

--

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14062#comment:1
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler