
#14013: Bad monads performance -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled, not all extensions are needed):
{{{
{-# LANGUAGE AllowAmbiguousTypes, ApplicativeDo, Arrows, BangPatterns, BinaryLiterals, ConstraintKinds, DataKinds, DefaultSignatures, DeriveDataTypeable, DeriveFoldable, DeriveFunctor, DeriveGeneric, DeriveTraversable, DoAndIfThenElse, DuplicateRecordFields, EmptyDataDecls, FlexibleContexts, FlexibleInstances, FunctionalDependencies, GeneralizedNewtypeDeriving, InstanceSigs, LambdaCase, MonadComprehensions, MultiWayIf, NamedWildCards, NegativeLiterals, NoImplicitPrelude, NumDecimals, OverloadedLabels, PackageImports, QuasiQuotes, RankNTypes, RecursiveDo, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell, TupleSections, TypeApplications, TypeFamilies, TypeFamilyDependencies, TypeOperators, ViewPatterns, LiberalTypeSynonyms, RelaxedPolyRec #-}
module Main where
import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid
----------------------------- -- === Criterion utils === -- -----------------------------
eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
------------------------------- -- === (a*) list parsing === -- -------------------------------
genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ]
}}}
The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}}
we again get the good performance without the need to use `SPECIALIZE` pragmas.
6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature?
4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.
New description: Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled): {{{ module Main where import Prelude import Criterion.Main import qualified Control.Monad.State.Strict as Strict import qualified Control.Monad.State.Class as State import Control.DeepSeq (NFData, rnf, force) import GHC.IO (evaluate) import Data.Monoid ----------------------------- -- === Criterion utils === -- ----------------------------- eval :: NFData a => a -> IO a eval = evaluate . force ; {-# INLINE eval #-} liftExp :: (Int -> a) -> (Int -> a) liftExp f = f . (10^) ; {-# INLINE liftExp #-} expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a) expCodeGen f i = do putStrLn $ "generating input code (10e" <> show i <> " chars)" out <- eval $ liftExp f i putStrLn "code generated sucessfully" return out {-# INLINE expCodeGen #-} expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-} ------------------------------- -- === (a*) list parsing === -- ------------------------------- genList_a :: Int -> [Char] genList_a i = replicate i 'a' ; {-# INLINE genList_a #-} pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} mtlStateListParser_a_typed :: Strict.State [Char] Bool mtlStateListParser_a_typed = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a_typed [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_typed #-} mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-} {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} main = do defaultMain [ bgroup "a*" $ [ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6] , bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6] , bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6] , bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6] ] ] }}} The code was compiled with following options (and many other variations): `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal` Everything in this code has `INLINE` pragma. The important part we should focus on are these two functions: {{{ pureListParser_a :: [Char] -> Bool pureListParser_a = \case 'a':s -> pureListParser_a s [] -> True _ -> False {-# INLINE pureListParser_a #-} mtlStateListParser_a :: State.MonadState [Char] m => m Bool mtlStateListParser_a = State.get >>= \case 'a':s -> State.put s >> mtlStateListParser_a [] -> return True _ -> return False {-# INLINE mtlStateListParser_a #-} }}} Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses `State` to keep the remaining input. The following list contains performance related observations: 0. For the rest of the points, let's call the performance of `pureListParser_a` a "good" one and everything worse a "bad" one. 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times slower than `pureListParser_a`. Inspecting CORE we can observe that GHC jumps between `(# a,b #)` and `(a,b)` representations all the time. 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a` is good (exactly the same as `pureListParser_a`). 3. If we do NOT use specialize pragma, but we use explicite, non- polymorphic type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool`, the performance is bad (!), identical to the polymorphic version without specialization. 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char] Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}` we get the good performance. 5. If we transform `pureListParser_a` to {{{ mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool mtlStateListParser_a_let = go where go = Strict.get >>= \case 'a':s -> Strict.put s >> go [] -> return True _ -> return False {-# INLINE mtlStateListParser_a_let #-} }}} we again get the good performance without the need to use `SPECIALIZE` pragmas. 6. The performance of all the functions that are not optimized as good as `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. The above points raise the following questions: 1. Why GHC does not optimize `mtlStateListParser_a` the same way as `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)` comes from? 2. Is there any way to tell GHC to automatically insert `SPECIALIZE` pragmas, especially in performance critical code? 3. Why providing very-explicite type signature `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve the problem unless we use `SPECIALIZE` pragma that tells the same as the signature? 4. Why the trick to alias the body of recursive function to a local variable `go` affects the performance in any way, especially when it does NOT bring any variable to the local let scope? We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler