[GHC] #14013: Bad monads performance

22 Jul 2017

      #14013: Bad monads performance
-------------------------------------+-------------------------------------
           Reporter:  danilo2        |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  high           |         Milestone:
          Component:  Compiler       |           Version:  8.2.1-rc3
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 Hi! We've been struggling with a very strange GHC behavior on IRC today.
 Let's consider the following code (needs mtl and criterion to be compiled,
 not all extensions are needed):

 {{{

 {-# LANGUAGE AllowAmbiguousTypes, ApplicativeDo, Arrows, BangPatterns,
 BinaryLiterals,
   ConstraintKinds, DataKinds, DefaultSignatures, DeriveDataTypeable,
 DeriveFoldable,
   DeriveFunctor, DeriveGeneric, DeriveTraversable, DoAndIfThenElse,
 DuplicateRecordFields,
   EmptyDataDecls, FlexibleContexts, FlexibleInstances,
 FunctionalDependencies, GeneralizedNewtypeDeriving,
   InstanceSigs, LambdaCase, MonadComprehensions, MultiWayIf,
 NamedWildCards, NegativeLiterals,
   NoImplicitPrelude, NumDecimals, OverloadedLabels, PackageImports,
 QuasiQuotes, RankNTypes,
   RecursiveDo, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell,
 TupleSections,
   TypeApplications, TypeFamilies, TypeFamilyDependencies, TypeOperators,
 ViewPatterns,
   LiberalTypeSynonyms, RelaxedPolyRec #-}

 module Main where

 import Prelude
 import Criterion.Main
 import qualified Control.Monad.State.Strict as Strict
 import qualified Control.Monad.State.Class  as State
 import Control.DeepSeq (NFData, rnf, force)
 import GHC.IO          (evaluate)
 import Data.Monoid

 -----------------------------
 -- === Criterion utils === --
 -----------------------------

 eval :: NFData a => a -> IO a
 eval = evaluate . force ; {-# INLINE eval #-}

 liftExp :: (Int -> a) -> (Int -> a)
 liftExp f = f . (10^) ; {-# INLINE liftExp #-}

 expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a)
 expCodeGen f i = do
     putStrLn $ "generating input code (10e" <> show i <> " chars)"
     out <- eval $ liftExp f i
     putStrLn "code generated sucessfully"
     return out
 {-# INLINE expCodeGen #-}

 expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int
 -> Benchmark
 expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) .
 nf p ; {-# INLINE expCodeGenBench #-}

 -------------------------------
 -- === (a*) list parsing === --
 -------------------------------

 genList_a :: Int -> [Char]
 genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}

 pureListParser_a :: [Char] -> Bool
 pureListParser_a = \case
     'a':s -> pureListParser_a s
     []    -> True
     _     -> False
 {-# INLINE pureListParser_a #-}

 mtlStateListParser_a :: State.MonadState [Char] m => m Bool
 mtlStateListParser_a = State.get >>= \case
     'a':s -> State.put s >> mtlStateListParser_a
     []    -> return True
     _     -> return False
 {-# INLINE mtlStateListParser_a #-}

 mtlStateListParser_a_typed :: Strict.State [Char] Bool
 mtlStateListParser_a_typed = State.get >>= \case
     'a':s -> State.put s >> mtlStateListParser_a_typed
     []    -> return True
     _     -> return False
 {-# INLINE mtlStateListParser_a_typed #-}

 mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
 mtlStateListParser_a_let = go where
     go = Strict.get >>= \case
         'a':s -> Strict.put s >> go
         []    -> return True
         _     -> return False
 {-# INLINE mtlStateListParser_a_let #-}

 {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}
 {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}

 main = do
     defaultMain
         [ bgroup "a*" $
             [ bgroup "pure"                    $ expCodeGenBench genList_a
 pureListParser_a                              <$> [6..6]
             , bgroup "mtl.State.Strict"        $ expCodeGenBench genList_a
 (Strict.evalState mtlStateListParser_a)       <$> [6..6]
             , bgroup "mtl.State.Strict typed"  $ expCodeGenBench genList_a
 (Strict.evalState mtlStateListParser_a_typed) <$> [6..6]
             , bgroup "mtl.State.Strict let"    $ expCodeGenBench genList_a
 (Strict.evalState mtlStateListParser_a_let)   <$> [6..6]
             ]
         ]

 }}}

 The code was compiled with following options (and many other variations):
 `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100
 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick-
 factor=1000 -flate-dmd-anal`

 Everything in this code has `INLINE` pragma. The important part we should
 focus on are these two functions:

 {{{

 pureListParser_a :: [Char] -> Bool
 pureListParser_a = \case
     'a':s -> pureListParser_a s
     []    -> True
     _     -> False
 {-# INLINE pureListParser_a #-}

 mtlStateListParser_a :: State.MonadState [Char] m => m Bool
 mtlStateListParser_a = State.get >>= \case
     'a':s -> State.put s >> mtlStateListParser_a
     []    -> return True
     _     -> return False
 {-# INLINE mtlStateListParser_a #-}
 }}}

 Which are just "parsers" accepting strings containing only 'a' characters.
 The former is pure one, while the later uses `State` to keep the remaining
 input. The following list contains performance related observations:

 0. For the rest of the points, let's call the performance of
 `pureListParser_a` a "good" one and everything worse a "bad" one.

 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times
 slower than `pureListParser_a`. Inspecting CORE we can observe that GHC
 jumps between `(# a,b #)` and `(a,b)` representations all the time.

 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a ::
 Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a`
 is good (exactly the same as `pureListParser_a`).

 3. If we do NOT use specialize pragma, but we use explicite, non-
 polymorphic type signature `mtlStateListParser_a_typed :: Strict.State
 [Char] Bool`, the performance is bad (!), identical to the polymorphic
 version without specialization.

 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic
 type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char]
 Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State
 [Char] Bool #-}` we get the good performance.

 5. If we transform `pureListParser_a` to

 {{{
 mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
 mtlStateListParser_a_let = go where
     go = Strict.get >>= \case
         'a':s -> Strict.put s >> go
         []    -> return True
         _     -> return False
 {-# INLINE mtlStateListParser_a_let #-}
 }}}

 we again get the good performance without the need to use `SPECIALIZE`
 pragmas.

 The above points raise the following questions:

 1. Why GHC does not optimize `mtlStateListParser_a` the same way as
 `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)`
 comes from?

 2. Is there any way to tell GHC to automatically insert `SPECIALIZE`
 pragmas, especially in performance critical code?

 3. Why providing very-explicite type signature `mtlStateListParser_a_typed
 :: Strict.State [Char] Bool` does not solve the problem unless we use
 `SPECIALIZE` pragma that tells the same as the signature?

 4. Why the trick to alias the body of recursive function to a local
 variable `go` affects the performance in any way, especially when it does
 NOT bring any variable to the local let scope?

 We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several
 people reported exactly the same observations.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14013
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

tags

participants (1)