[GHC] #14013: Bad monads performance

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled, not all extensions are needed): {{{ {-# LANGUAGE AllowAmbiguousTypes, ApplicativeDo, Arrows, BangPatterns, BinaryLiterals, ConstraintKinds, DataKinds, DefaultSignatures, DeriveDataTypeable, DeriveFoldable, DeriveFunctor, DeriveGeneric, DeriveTraversable, DoAndIfThenElse, DuplicateRecordFields, EmptyDataDecls, FlexibleContexts, FlexibleInstances, FunctionalDependencies, GeneralizedNewtypeDeriving, InstanceSigs, LambdaCase, MonadComprehensions, MultiWayIf, NamedWildCards, NegativeLiterals, NoImplicitPrelude, NumDecimals, OverloadedLabels, PackageImports, QuasiQuotes, RankNTypes, RecursiveDo, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell, TupleSections, TypeApplications, TypeFamilies, TypeFamilyDependencies, TypeOperators, ViewPatterns, LiberalTypeSynonyms, RelaxedPolyRec #-} module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{ pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled, not all extensions are needed):
{{{
{-# LANGUAGE AllowAmbiguousTypes, ApplicativeDo, Arrows, BangPatterns, BinaryLiterals, ConstraintKinds, DataKinds, DefaultSignatures, DeriveDataTypeable, DeriveFoldable, DeriveFunctor, DeriveGeneric, DeriveTraversable, DoAndIfThenElse, DuplicateRecordFields, EmptyDataDecls, FlexibleContexts, FlexibleInstances, FunctionalDependencies, GeneralizedNewtypeDeriving, InstanceSigs, LambdaCase, MonadComprehensions, MultiWayIf, NamedWildCards, NegativeLiterals, NoImplicitPrelude, NumDecimals, OverloadedLabels, PackageImports, QuasiQuotes, RankNTypes, RecursiveDo, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell, TupleSections, TypeApplications, TypeFamilies, TypeFamilyDependencies, TypeOperators, ViewPatterns, LiberalTypeSynonyms, RelaxedPolyRec #-}
module Main where
import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid
----------------------------- -- === Criterion utils === -- -----------------------------
eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
------------------------------- -- === (a*) list parsing === -- -------------------------------
genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ]
}}}
The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}}
we again get the good performance without the need to use `SPECIALIZE` pragmas.
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature?
4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.
New description: Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled, not all extensions are needed): {{{ {-# LANGUAGE AllowAmbiguousTypes, ApplicativeDo, Arrows, BangPatterns, BinaryLiterals, ConstraintKinds, DataKinds, DefaultSignatures, DeriveDataTypeable, DeriveFoldable, DeriveFunctor, DeriveGeneric, DeriveTraversable, DoAndIfThenElse, DuplicateRecordFields, EmptyDataDecls, FlexibleContexts, FlexibleInstances, FunctionalDependencies, GeneralizedNewtypeDeriving, InstanceSigs, LambdaCase, MonadComprehensions, MultiWayIf, NamedWildCards, NegativeLiterals, NoImplicitPrelude, NumDecimals, OverloadedLabels, PackageImports, QuasiQuotes, RankNTypes, RecursiveDo, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell, TupleSections, TypeApplications, TypeFamilies, TypeFamilyDependencies, TypeOperators, ViewPatterns, LiberalTypeSynonyms, RelaxedPolyRec #-} module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{ pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. 6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled, not all extensions are needed):
{{{
{-# LANGUAGE AllowAmbiguousTypes, ApplicativeDo, Arrows, BangPatterns, BinaryLiterals, ConstraintKinds, DataKinds, DefaultSignatures, DeriveDataTypeable, DeriveFoldable, DeriveFunctor, DeriveGeneric, DeriveTraversable, DoAndIfThenElse, DuplicateRecordFields, EmptyDataDecls, FlexibleContexts, FlexibleInstances, FunctionalDependencies, GeneralizedNewtypeDeriving, InstanceSigs, LambdaCase, MonadComprehensions, MultiWayIf, NamedWildCards, NegativeLiterals, NoImplicitPrelude, NumDecimals, OverloadedLabels, PackageImports, QuasiQuotes, RankNTypes, RecursiveDo, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell, TupleSections, TypeApplications, TypeFamilies, TypeFamilyDependencies, TypeOperators, ViewPatterns, LiberalTypeSynonyms, RelaxedPolyRec #-}
module Main where
import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid
----------------------------- -- === Criterion utils === -- -----------------------------
eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
------------------------------- -- === (a*) list parsing === -- -------------------------------
genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ]
}}}
The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}}
we again get the good performance without the need to use `SPECIALIZE` pragmas.
6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature?
4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.
New description: Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled): {{{ module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{ pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. 6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled):
{{{ module Main where
import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid
----------------------------- -- === Criterion utils === -- -----------------------------
eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
------------------------------- -- === (a*) list parsing === -- -------------------------------
genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ]
}}}
The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}}
we again get the good performance without the need to use `SPECIALIZE` pragmas.
6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature?
4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.
New description: Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled): {{{ module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{ pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. 6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. 7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results. The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled):
{{{ module Main where
import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid
----------------------------- -- === Criterion utils === -- -----------------------------
eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
------------------------------- -- === (a*) list parsing === -- -------------------------------
genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ]
}}}
The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}}
we again get the good performance without the need to use `SPECIALIZE` pragmas.
6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results.
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature?
4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.
New description: Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled): {{{ module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{ pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. 6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. 7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results. The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? (GHC even warns: `SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’` but it affects performance.) 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled):
{{{ module Main where
import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid
----------------------------- -- === Criterion utils === -- -----------------------------
eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
------------------------------- -- === (a*) list parsing === -- -------------------------------
genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ]
}}}
The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}}
we again get the good performance without the need to use `SPECIALIZE` pragmas.
6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results.
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? (GHC even warns: `SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’` but it affects performance.)
4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.
New description: Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled): {{{ module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{ pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. 6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. 7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results (https://ghc.haskell.org/trac/ghc/ticket/12463). The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? (GHC even warns: `SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’` but it affects performance.) 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Old description:
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled):
{{{ module Main where
import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid
----------------------------- -- === Criterion utils === -- -----------------------------
eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
------------------------------- -- === (a*) list parsing === -- -------------------------------
genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ]
}}}
The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}}
we again get the good performance without the need to use `SPECIALIZE` pragmas.
6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results (https://ghc.haskell.org/trac/ghc/ticket/12463).
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? (GHC even warns: `SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’` but it affects performance.)
4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.
New description: Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled): {{{#!hs module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{#!hs pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{#!hs mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. 6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. 7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results (https://ghc.haskell.org/trac/ghc/ticket/12463). The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? (GHC even warns: `SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’` but it affects performance.) 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- Comment (by bgamari): Very interesting; this will be an interesting thing to mull over tomorrow morning. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled):
{{{#!hs module Main where
import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid
----------------------------- -- === Criterion utils === -- -----------------------------
eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
------------------------------- -- === (a*) list parsing === -- -------------------------------
genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ]
}}}
The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions:
{{{#!hs
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{#!hs mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}}
we again get the good performance without the need to use `SPECIALIZE` pragmas.
6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results (https://ghc.haskell.org/trac/ghc/ticket/12463).
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? (GHC even warns: `SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’` but it affects performance.)
4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.
New description: Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled): {{{#!hs module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{#!hs pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{#!hs mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. 6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. 7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect the results (https://ghc.haskell.org/trac/ghc/ticket/12463). 8. If you do NOT use `INLINE` pragma on functions `mtlStateListParser_a` and `mtlStateListParser_a_typed` their performance is good (so `INLINE` pragma makes it bad until we provide explicit specialization). Moreover, if we use `INLINABLE` pragma instead of `INLINE` on these functions (which logically makes more sense, because they are recursive), performance of the polymorphic one `mtlStateListParser_a` is good, while performance of the explicitly typed `mtlStateListParser_a_typed` is bad until we provide explicite specialization. The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? (GHC even warns: `SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’` but it affects performance.) 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): I added new observation - point 8. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I have not worked all of this out, but I have learned something. You have this INLINE pragma: {{{ {-# INLINE mtlStateListParser_a #-} }}} It does nothing, because the function is recursive. But alas, it harms things a lot. Simply removing that INLINE pragma makes `mtlStateListParser_a` behave well in all settings, I think. Can you confirm that? I believe that the reason things go bad is this. GHC does this transformation (always): {{{ f = e |> co ===> f' = e f = f' |> co }}} Reason: `f` can now be inlined at all use sites, and `co` may cancel. But if the original `f` has an INLINE pragma we get {{{ f = e |> co { INLINE = <inline rhs> } ===> f' = e f = f' |> co { INLINE = <inline rhs> } }}} where the `{ INLINE = <inline rhs> }` is the (stable, user-written) inlining for `f`. Now the point of the transformation is lost, becuase `f` won't be replaced at its use sites by `f' |> co`; the INLINE pragma is what gets inlined. Moreover, if `<inline rhs>` and `e` both mention `f`, then `f` becomes a loop breaker and we get mutual recusion between `f` and `f'`. This what ultimately leads to the alternation between `(,)` and `(##)` you observed. Solution (I think) don't do this transformation if `f` has an INLINE pragma. I'll try that. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): @simon, you are right. Removing `INLINE` makes these two functions run with fine performance. It is already described in point 8 in the ticket description. Please note, that point 8 tells about probably related problem with INLINABLE pragma. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Sigh. This has turned out to be much nastier than I expected. I worked solely on {{{ import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} foo :: [Char] -> Bool foo = Strict.evalState mtlStateListParser_a }}} I'll refer to `mtlStateListParser_a` as `msp`. * Yes, comment:9 is right; the right path is to make `doFloatFromRhs` return `False` for bindings with a stable unfolding. * Even when that is done, the occurrence analyser does a bad job. We get {{{ Rec { msp = ... lvl ... {-# INLINE = ..msp.. #-} -- The stable unfolding lvl = ...msp... } }}} The occurrence analyser treats the occurrence of `lvl` as a "weak" reference, and so sorts into SCCs thus: `Rec{ msp }, NonRec { lvl }`. So then it stupidly marks `msp` as a loop breaker, and `lvl` as a weak loop breaker. In this case they'd be better in one SCC, in which case we'd pick `msp` (but not `lvl`) as a loop breaker. The relevant change is in `OccurAnal`, around line 1280. {{{ -- Find the "nd_inl" free vars; for the loop-breaker phase inl_fvs = udFreeVars bndr_set rhs_usage1 `unionVarSet` case mb_unf_uds of Nothing -> emptyVarSet -- udFreeVars bndr_set rhs_usage1 -- No INLINE, use RHS Just unf_uds -> udFreeVars bndr_set unf_uds }}} But I'm not fully confident of this change. * Even if we fix that, then the strictness analyser fails. We end up with {{{ msp = (\ (s::[Char]). case s of p1 -> (False, x) p2 -> (msp |> sym co) s' ) |> co }}} Those casts are enough to kill demand analysis. It was relying on the coercion-floating that we nuked in comment:9! The function looks to the demand analyser as if it has arity zero, and so we get no useful strictness. Yes, we could teach the demand analyser more tricks, but the tail is beginning to wag the dog. * This is all stupid. An INILNE pragma on a recursive function is doing no good at all. Maybe we should just discard it. And indeed that makes things work. * Until you use an INLINABLE pragma! We don't want to discard the INLINEABLE pragama on a recursive function -- it is super-useful. But if we don't the same ills happen as with INLINE. Actually, the specialiser propagates an INLINE pragma to the specialised function, but does '''not''' propagate an INLINEABLE pragam. Result: if you give an overloaded signature for `msp`, the specialiser will create a pragma-free specialised version, which will optimise nicely. But if you give a non-overloaded signature `msp :: Strict.State [Char] Bool`, the function fails to optimise for the reasons above. Mind you, in the latter case the INLINEABLE pragma is just as useless as the INLINE pragma was. This is ridiculously terrible. The pragmas(which are there to optimise the program) are getting in the way of optimising the function itself. What to do? Here's a simple idea; * Discard INLINE pragmas for recursive, or mutually recursive, functions. (You can do this manually too!) * Peel off a top-level function for INLINEABLE pragmas, thus: {{{ Rec { f = e[f] {-# INLINEABLE = e[f] #-} } ===> Rec { f' = e[f'] } Rec { f = f' {-# INLINEABLE = e[f] #-} } }}} The first `Rec` is a pragma-free group. The second has all its pragmas (for later clients), but just indirect to the first group if you actually call it. Alas, you can't do this manually right now. But somehow none of this really feels right. I'm not sure what to do, so I'm just brain-dumping this. Maybe someone else will have better ideas -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * Attachment "simpl-INLINE-patch" added. WIP on floating from stable unfoldings -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * Attachment "occ-anal-rules-patch" added. WIP on occurrence analysis and rules -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): The attached patches are not finished; they were just WIP related to the comments above. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: simonpj Type: bug | Status: new Priority: high | Milestone: 8.4.1 Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: (none) => simonpj * milestone: => 8.4.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: simonpj Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.4.1 => 8.6.1 Comment: What is the status of this, Simon? Regardless, presumably nothing will change for 8.4. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC