[GHC] #14035: Weird performance results.

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results. The first "part" of it I've described here: https://ghc.haskell.org/trac/ghc/ticket/14013 (and I will be referring to this issue later), however there are some other problems that do not seem directly connected to 14013 ticket. **The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached.** The code is a pseudo-parser implementation. It consumes any char in a loop and fails on empty input in the end. \\ == Helpers Let's define 2 helpers: {{{#!hs (.) :: (b -> c) -> (a -> b) -> a -> c (.) f g = \x -> f (g x) ; {-# INLINE (.) #-} dotl :: (b -> c) -> (a -> b) -> a -> c dotl ~f ~g = \ ~x -> f (g x) ; {-# INLINE dotl #-} }}} So whenever we see `.` in code it is strict in all of its arguments. \\ == Strict StateT performance improvement Let's consider following code: {{{#!hs import qualified Control.Monad.State.Strict as S newtype StateT s m a = StateT { fromStateT :: S.StateT s m a } deriving (Applicative, Functor, Monad, MonadTrans) class MonadState s m | m -> s where get :: m s put :: s -> m () runStateT :: forall s m a. StateT s m a -> s -> m (a, s) evalStateT :: forall s m a. Functor m => StateT s m a -> s -> m a runStateT m s = S.runStateT (fromStateT m) s ; {-# INLINE runStateT #-} evalStateT m = fmap fst . runStateT m ; {-# INLINE evalStateT #-} instance Monad m => MonadState s (StateT s m) where get = StateT S.get ; {-# INLINE get #-} put = StateT . S.put ; {-# INLINE put #-} }}} There are few non-obvious things to note here: 1. This wrapper performs about **15 TIMES better** than `Control.Monad.State.Strict.StateT` (in the provided examples) and if we create a loop in pure code imitating a parser, this `StateT` gets completely optimized away, while the `mtl`'s version does not. 2. If we replace the following functions with lazy composition, we get the same, high performance: {{{#!hs runStateT = S.runStateT `dotl` fromStateT ; {-# INLINE runStateT #-} evalStateT m = fmap fst `dotl` runStateT m ; {-# INLINE evalStateT #-} }}} 3. However, if we slightly change the `evalStateT`, we've got the bad performance, equals to the `mtl`'s `StateT` version (15 times slower): {{{#!hs evalStateT m a = fmap fst $ runStateT m a ; {-# INLINE evalStateT #-} }}} It's a very strange result, especially that `evalStateT` is used only once in the code while running the tests. \\ == Strict Either & EitherT The code contains a very minimalistic implementation of `Either` and `EitherT` in order to make their definitions and utils strict. These definitions are copy-pasted and simplified (removed unused code) from: https://hackage.haskell.org/package/base-4.10.0.0/docs/src/Data.Either.html https://hackage.haskell.org/package/either-4.4.1.1/docs/src/Control.Monad.Tr... \\ == Strict Bool and tuple Moreover we define strict Bool `or` operation and 2-element tuple with strict arguments: {{{#!hs data T a b = T !a !b deriving (Generic, Show, Functor) data XBool = XTrue | XFalse deriving (Show, Generic) (|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} }}} \\ == Parser implementation All the above declarations were simple helpers compiled with `-XStrict`, because available libraries do not provide them for us. This code is a "real" use case and shows the weird performance results. The parser implementation is simple: {{{#!hs newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor) instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined instance Monad m => Monad (FailParser m) where return a = FailParser $ pure $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b a <- ma T !b' a' <- fromFailParser $ f a return $! T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-} instance MonadTrans (FailParser) where lift m = FailParser $! lift $ fmap (T XFalse) m ; {-# INLINE lift #-} }}} We use `undefined` for non-important functions. The parser is `EitherT` wrapper: Left happens when we failed parsing input, while Right otherwise. The `XBool` denotes if we made any progress (so after consuming a letter it is set to `XTrue`). There are some additional util functions, like `returnProgress` which behaves just like return, but also sets the `XBool` value to `XTrue`: {{{#!hs instance Monad m => MonadProgressParser (FailParser m) where returnProgress a = FailParser $! pure (T XFalse a) ; {-# INLINE returnProgress #-} -- In correct code it should be XTrue as described below. }}} In the provided code, there is `XFalse` used instead of `XTrue` because of some interesting observations: 1. The `XBool` value is used ONLY in the expression `return $! T (b ||| b') a'` above, so it does NOT affect the way the program logically executes it's body. 2. Both `b` and `b'` are strict and fully evaluated. 3. If both `b` and `b'` are `XFalse` (as in the provided code, they always are `XFalse`) we get a good performance. In order to test it, the above code contains `XFalse` instead of `XTrue`. 4. If we use the correct version of `returnProgress` as described just before point 1 above, we get 15 times slower performance (the same or very similar to the one when used altered `evalStateT` definition). We could try to explain it: maybe Haskell was able to optimize code if it discovered, there always were only `XFalse` values used and after the change there are both `XTrue` and `XFalse`, so it really has to run the `(|||)` operator. This way of thinking fails as fast as we check that changing `XFalse` to `XTrue` **everywhere** in the code give us again bad performance. 5. If we replace `return $! T (b ||| b') a'` with `return $! T b' a'` we get good performance, while replacing it with `return $! T b a'` give us bad performance. It does not make any sense, because both `b` and `b'` are strict and fully evaluated. Moreover, it is the only place in code where they are used. 6. However replacing `return $! T (b ||| b') a'` with `return $! T (b' ||| b) a'` does NOT change the performance (we are getting the good one). == Final notes We've been talking with some people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. In fact we started to be worried a lot about how we can use Haskell for high-performance parts at all if it is not obvious if a very simple changes do not affect performance so much, like changing `evalStateT m = fmap fst . runStateT m` to `evalStateT m a = fmap fst $ runStateT m a`, which gives 15 times slowdown. It makes the code both very fragile to any changes as well as makes it unmaintainable. Tracking performance in a very small program (like the attached one) is possible, while tracking it in bigger one, taking in considerating the described problems, make it almost impossible. I'm writing this because I'm worried about where these problems originate from and I would really like to solve them / know why they appear and be sure we can continue to use Haskell for our high-performance demanding applications. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by danilo2): * Attachment "Main.hs" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results. The first "part" of it I've described here: https://ghc.haskell.org/trac/ghc/ticket/14013 (and I will be referring to this issue later), however there are some other problems that do not seem directly connected to 14013 ticket.
**The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached.** The code is a pseudo-parser implementation. It consumes any char in a loop and fails on empty input in the end.
\\
== Helpers
Let's define 2 helpers: {{{#!hs (.) :: (b -> c) -> (a -> b) -> a -> c (.) f g = \x -> f (g x) ; {-# INLINE (.) #-}
dotl :: (b -> c) -> (a -> b) -> a -> c dotl ~f ~g = \ ~x -> f (g x) ; {-# INLINE dotl #-} }}}
So whenever we see `.` in code it is strict in all of its arguments.
\\
== Strict StateT performance improvement
Let's consider following code: {{{#!hs import qualified Control.Monad.State.Strict as S
newtype StateT s m a = StateT { fromStateT :: S.StateT s m a } deriving (Applicative, Functor, Monad, MonadTrans)
class MonadState s m | m -> s where get :: m s put :: s -> m ()
runStateT :: forall s m a. StateT s m a -> s -> m (a, s) evalStateT :: forall s m a. Functor m => StateT s m a -> s -> m a runStateT m s = S.runStateT (fromStateT m) s ; {-# INLINE runStateT #-} evalStateT m = fmap fst . runStateT m ; {-# INLINE evalStateT #-}
instance Monad m => MonadState s (StateT s m) where get = StateT S.get ; {-# INLINE get #-} put = StateT . S.put ; {-# INLINE put #-} }}}
There are few non-obvious things to note here: 1. This wrapper performs about **15 TIMES better** than `Control.Monad.State.Strict.StateT` (in the provided examples) and if we create a loop in pure code imitating a parser, this `StateT` gets completely optimized away, while the `mtl`'s version does not.
2. If we replace the following functions with lazy composition, we get the same, high performance: {{{#!hs runStateT = S.runStateT `dotl` fromStateT ; {-# INLINE runStateT #-} evalStateT m = fmap fst `dotl` runStateT m ; {-# INLINE evalStateT #-} }}}
3. However, if we slightly change the `evalStateT`, we've got the bad performance, equals to the `mtl`'s `StateT` version (15 times slower): {{{#!hs evalStateT m a = fmap fst $ runStateT m a ; {-# INLINE evalStateT #-} }}}
It's a very strange result, especially that `evalStateT` is used only once in the code while running the tests.
\\
== Strict Either & EitherT
The code contains a very minimalistic implementation of `Either` and `EitherT` in order to make their definitions and utils strict. These definitions are copy-pasted and simplified (removed unused code) from: https://hackage.haskell.org/package/base-4.10.0.0/docs/src/Data.Either.html https://hackage.haskell.org/package/either-4.4.1.1/docs/src/Control.Monad.Tr...
\\
== Strict Bool and tuple
Moreover we define strict Bool `or` operation and 2-element tuple with strict arguments:
{{{#!hs data T a b = T !a !b deriving (Generic, Show, Functor)
data XBool = XTrue | XFalse deriving (Show, Generic)
(|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} }}}
\\
== Parser implementation
All the above declarations were simple helpers compiled with `-XStrict`, because available libraries do not provide them for us. This code is a "real" use case and shows the weird performance results.
The parser implementation is simple: {{{#!hs newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor)
instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined
instance Monad m => Monad (FailParser m) where return a = FailParser $ pure $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b a <- ma T !b' a' <- fromFailParser $ f a return $! T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-}
instance MonadTrans (FailParser) where lift m = FailParser $! lift $ fmap (T XFalse) m ; {-# INLINE lift #-} }}}
We use `undefined` for non-important functions. The parser is `EitherT` wrapper: Left happens when we failed parsing input, while Right otherwise. The `XBool` denotes if we made any progress (so after consuming a letter it is set to `XTrue`). There are some additional util functions, like `returnProgress` which behaves just like return, but also sets the `XBool` value to `XTrue`:
{{{#!hs instance Monad m => MonadProgressParser (FailParser m) where returnProgress a = FailParser $! pure (T XFalse a) ; {-# INLINE returnProgress #-} -- In correct code it should be XTrue as described below. }}}
In the provided code, there is `XFalse` used instead of `XTrue` because of some interesting observations:
1. The `XBool` value is used ONLY in the expression `return $! T (b ||| b') a'` above, so it does NOT affect the way the program logically executes it's body.
2. Both `b` and `b'` are strict and fully evaluated.
3. If both `b` and `b'` are `XFalse` (as in the provided code, they always are `XFalse`) we get a good performance. In order to test it, the above code contains `XFalse` instead of `XTrue`.
4. If we use the correct version of `returnProgress` as described just before point 1 above, we get 15 times slower performance (the same or very similar to the one when used altered `evalStateT` definition). We could try to explain it: maybe Haskell was able to optimize code if it discovered, there always were only `XFalse` values used and after the change there are both `XTrue` and `XFalse`, so it really has to run the `(|||)` operator. This way of thinking fails as fast as we check that changing `XFalse` to `XTrue` **everywhere** in the code give us again bad performance.
5. If we replace `return $! T (b ||| b') a'` with `return $! T b' a'` we get good performance, while replacing it with `return $! T b a'` give us bad performance. It does not make any sense, because both `b` and `b'` are strict and fully evaluated. Moreover, it is the only place in code where they are used.
6. However replacing `return $! T (b ||| b') a'` with `return $! T (b' ||| b) a'` does NOT change the performance (we are getting the good one).
== Final notes
We've been talking with some people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. In fact we started to be worried a lot about how we can use Haskell for high-performance parts at all if it is not obvious if a very simple changes do not affect performance so much, like changing `evalStateT m = fmap fst . runStateT m` to `evalStateT m a = fmap fst $ runStateT m a`, which gives 15 times slowdown. It makes the code both very fragile to any changes as well as makes it unmaintainable. Tracking performance in a very small program (like the attached one) is possible, while tracking it in bigger one, taking in considerating the described problems, make it almost impossible. I'm writing this because I'm worried about where these problems originate from and I would really like to solve them / know why they appear and be sure we can continue to use Haskell for our high- performance demanding applications.
New description: Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results. The first "part" of it I've described here: https://ghc.haskell.org/trac/ghc/ticket/14013 (and I will be referring to this issue later), however there are some other problems that do not seem directly connected to 14013 ticket. **The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached.** The code is a pseudo-parser implementation. It consumes any char in a loop and fails on empty input in the end. Everything was compiled with following options (and many variations): `"-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal -fspecialise-aggressively"`. \\ == Helpers Let's define 2 helpers: {{{#!hs (.) :: (b -> c) -> (a -> b) -> a -> c (.) f g = \x -> f (g x) ; {-# INLINE (.) #-} dotl :: (b -> c) -> (a -> b) -> a -> c dotl ~f ~g = \ ~x -> f (g x) ; {-# INLINE dotl #-} }}} So whenever we see `.` in code it is strict in all of its arguments. \\ == Strict StateT performance improvement Let's consider following code: {{{#!hs import qualified Control.Monad.State.Strict as S newtype StateT s m a = StateT { fromStateT :: S.StateT s m a } deriving (Applicative, Functor, Monad, MonadTrans) class MonadState s m | m -> s where get :: m s put :: s -> m () runStateT :: forall s m a. StateT s m a -> s -> m (a, s) evalStateT :: forall s m a. Functor m => StateT s m a -> s -> m a runStateT m s = S.runStateT (fromStateT m) s ; {-# INLINE runStateT #-} evalStateT m = fmap fst . runStateT m ; {-# INLINE evalStateT #-} instance Monad m => MonadState s (StateT s m) where get = StateT S.get ; {-# INLINE get #-} put = StateT . S.put ; {-# INLINE put #-} }}} There are few non-obvious things to note here: 1. This wrapper performs about **15 TIMES better** than `Control.Monad.State.Strict.StateT` (in the provided examples) and if we create a loop in pure code imitating a parser, this `StateT` gets completely optimized away, while the `mtl`'s version does not. 2. If we replace the following functions with lazy composition, we get the same, high performance: {{{#!hs runStateT = S.runStateT `dotl` fromStateT ; {-# INLINE runStateT #-} evalStateT m = fmap fst `dotl` runStateT m ; {-# INLINE evalStateT #-} }}} 3. However, if we slightly change the `evalStateT`, we've got the bad performance, equals to the `mtl`'s `StateT` version (15 times slower): {{{#!hs evalStateT m a = fmap fst $ runStateT m a ; {-# INLINE evalStateT #-} }}} It's a very strange result, especially that `evalStateT` is used only once in the code while running the tests. \\ == Strict Either & EitherT The code contains a very minimalistic implementation of `Either` and `EitherT` in order to make their definitions and utils strict. These definitions are copy-pasted and simplified (removed unused code) from: https://hackage.haskell.org/package/base-4.10.0.0/docs/src/Data.Either.html https://hackage.haskell.org/package/either-4.4.1.1/docs/src/Control.Monad.Tr... \\ == Strict Bool and tuple Moreover we define strict Bool `or` operation and 2-element tuple with strict arguments: {{{#!hs data T a b = T !a !b deriving (Generic, Show, Functor) data XBool = XTrue | XFalse deriving (Show, Generic) (|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} }}} \\ == Parser implementation All the above declarations were simple helpers compiled with `-XStrict`, because available libraries do not provide them for us. This code is a "real" use case and shows the weird performance results. The parser implementation is simple: {{{#!hs newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor) instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined instance Monad m => Monad (FailParser m) where return a = FailParser $ pure $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b a <- ma T !b' a' <- fromFailParser $ f a return $! T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-} instance MonadTrans (FailParser) where lift m = FailParser $! lift $ fmap (T XFalse) m ; {-# INLINE lift #-} }}} We use `undefined` for non-important functions. The parser is `EitherT` wrapper: Left happens when we failed parsing input, while Right otherwise. The `XBool` denotes if we made any progress (so after consuming a letter it is set to `XTrue`). There are some additional util functions, like `returnProgress` which behaves just like return, but also sets the `XBool` value to `XTrue`: {{{#!hs instance Monad m => MonadProgressParser (FailParser m) where returnProgress a = FailParser $! pure (T XFalse a) ; {-# INLINE returnProgress #-} -- In correct code it should be XTrue as described below. }}} In the provided code, there is `XFalse` used instead of `XTrue` because of some interesting observations: 1. The `XBool` value is used ONLY in the expression `return $! T (b ||| b') a'` above, so it does NOT affect the way the program logically executes it's body. 2. Both `b` and `b'` are strict and fully evaluated. 3. If both `b` and `b'` are `XFalse` (as in the provided code, they always are `XFalse`) we get a good performance. In order to test it, the above code contains `XFalse` instead of `XTrue`. 4. If we use the correct version of `returnProgress` as described just before point 1 above, we get 15 times slower performance (the same or very similar to the one when used altered `evalStateT` definition). We could try to explain it: maybe Haskell was able to optimize code if it discovered, there always were only `XFalse` values used and after the change there are both `XTrue` and `XFalse`, so it really has to run the `(|||)` operator. This way of thinking fails as fast as we check that changing `XFalse` to `XTrue` **everywhere** in the code give us again bad performance. 5. If we replace `return $! T (b ||| b') a'` with `return $! T b' a'` we get good performance, while replacing it with `return $! T b a'` give us bad performance. It does not make any sense, because both `b` and `b'` are strict and fully evaluated. Moreover, it is the only place in code where they are used. 6. However replacing `return $! T (b ||| b') a'` with `return $! T (b' ||| b) a'` does NOT change the performance (we are getting the good one). == Final notes We've been talking with some people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. In fact we started to be worried a lot about how we can use Haskell for high-performance parts at all if it is not obvious if a very simple changes do not affect performance so much, like changing `evalStateT m = fmap fst . runStateT m` to `evalStateT m a = fmap fst $ runStateT m a`, which gives 15 times slowdown. It makes the code both very fragile to any changes as well as makes it unmaintainable. Tracking performance in a very small program (like the attached one) is possible, while tracking it in bigger one, taking in considerating the described problems, make it almost impossible. I'm writing this because I'm worried about where these problems originate from and I would really like to solve them / know why they appear and be sure we can continue to use Haskell for our high-performance demanding applications. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by danilo2): * Attachment "Main.hs" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results. The first "part" of it I've described here: https://ghc.haskell.org/trac/ghc/ticket/14013 (and I will be referring to this issue later), however there are some other problems that do not seem directly connected to 14013 ticket.
**The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached.** The code is a pseudo-parser implementation. It consumes any char in a loop and fails on empty input in the end.
Everything was compiled with following options (and many variations): `"-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal -fspecialise-aggressively"`.
\\
== Helpers
Let's define 2 helpers: {{{#!hs (.) :: (b -> c) -> (a -> b) -> a -> c (.) f g = \x -> f (g x) ; {-# INLINE (.) #-}
dotl :: (b -> c) -> (a -> b) -> a -> c dotl ~f ~g = \ ~x -> f (g x) ; {-# INLINE dotl #-} }}}
So whenever we see `.` in code it is strict in all of its arguments.
\\
== Strict StateT performance improvement
Let's consider following code: {{{#!hs import qualified Control.Monad.State.Strict as S
newtype StateT s m a = StateT { fromStateT :: S.StateT s m a } deriving (Applicative, Functor, Monad, MonadTrans)
class MonadState s m | m -> s where get :: m s put :: s -> m ()
runStateT :: forall s m a. StateT s m a -> s -> m (a, s) evalStateT :: forall s m a. Functor m => StateT s m a -> s -> m a runStateT m s = S.runStateT (fromStateT m) s ; {-# INLINE runStateT #-} evalStateT m = fmap fst . runStateT m ; {-# INLINE evalStateT #-}
instance Monad m => MonadState s (StateT s m) where get = StateT S.get ; {-# INLINE get #-} put = StateT . S.put ; {-# INLINE put #-} }}}
There are few non-obvious things to note here: 1. This wrapper performs about **15 TIMES better** than `Control.Monad.State.Strict.StateT` (in the provided examples) and if we create a loop in pure code imitating a parser, this `StateT` gets completely optimized away, while the `mtl`'s version does not.
2. If we replace the following functions with lazy composition, we get the same, high performance: {{{#!hs runStateT = S.runStateT `dotl` fromStateT ; {-# INLINE runStateT #-} evalStateT m = fmap fst `dotl` runStateT m ; {-# INLINE evalStateT #-} }}}
3. However, if we slightly change the `evalStateT`, we've got the bad performance, equals to the `mtl`'s `StateT` version (15 times slower): {{{#!hs evalStateT m a = fmap fst $ runStateT m a ; {-# INLINE evalStateT #-} }}}
It's a very strange result, especially that `evalStateT` is used only once in the code while running the tests.
\\
== Strict Either & EitherT
The code contains a very minimalistic implementation of `Either` and `EitherT` in order to make their definitions and utils strict. These definitions are copy-pasted and simplified (removed unused code) from: https://hackage.haskell.org/package/base-4.10.0.0/docs/src/Data.Either.html https://hackage.haskell.org/package/either-4.4.1.1/docs/src/Control.Monad.Tr...
\\
== Strict Bool and tuple
Moreover we define strict Bool `or` operation and 2-element tuple with strict arguments:
{{{#!hs data T a b = T !a !b deriving (Generic, Show, Functor)
data XBool = XTrue | XFalse deriving (Show, Generic)
(|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} }}}
\\
== Parser implementation
All the above declarations were simple helpers compiled with `-XStrict`, because available libraries do not provide them for us. This code is a "real" use case and shows the weird performance results.
The parser implementation is simple: {{{#!hs newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor)
instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined
instance Monad m => Monad (FailParser m) where return a = FailParser $ pure $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b a <- ma T !b' a' <- fromFailParser $ f a return $! T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-}
instance MonadTrans (FailParser) where lift m = FailParser $! lift $ fmap (T XFalse) m ; {-# INLINE lift #-} }}}
We use `undefined` for non-important functions. The parser is `EitherT` wrapper: Left happens when we failed parsing input, while Right otherwise. The `XBool` denotes if we made any progress (so after consuming a letter it is set to `XTrue`). There are some additional util functions, like `returnProgress` which behaves just like return, but also sets the `XBool` value to `XTrue`:
{{{#!hs instance Monad m => MonadProgressParser (FailParser m) where returnProgress a = FailParser $! pure (T XFalse a) ; {-# INLINE returnProgress #-} -- In correct code it should be XTrue as described below. }}}
In the provided code, there is `XFalse` used instead of `XTrue` because of some interesting observations:
1. The `XBool` value is used ONLY in the expression `return $! T (b ||| b') a'` above, so it does NOT affect the way the program logically executes it's body.
2. Both `b` and `b'` are strict and fully evaluated.
3. If both `b` and `b'` are `XFalse` (as in the provided code, they always are `XFalse`) we get a good performance. In order to test it, the above code contains `XFalse` instead of `XTrue`.
4. If we use the correct version of `returnProgress` as described just before point 1 above, we get 15 times slower performance (the same or very similar to the one when used altered `evalStateT` definition). We could try to explain it: maybe Haskell was able to optimize code if it discovered, there always were only `XFalse` values used and after the change there are both `XTrue` and `XFalse`, so it really has to run the `(|||)` operator. This way of thinking fails as fast as we check that changing `XFalse` to `XTrue` **everywhere** in the code give us again bad performance.
5. If we replace `return $! T (b ||| b') a'` with `return $! T b' a'` we get good performance, while replacing it with `return $! T b a'` give us bad performance. It does not make any sense, because both `b` and `b'` are strict and fully evaluated. Moreover, it is the only place in code where they are used.
6. However replacing `return $! T (b ||| b') a'` with `return $! T (b' ||| b) a'` does NOT change the performance (we are getting the good one).
== Final notes
We've been talking with some people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. In fact we started to be worried a lot about how we can use Haskell for high-performance parts at all if it is not obvious if a very simple changes do not affect performance so much, like changing `evalStateT m = fmap fst . runStateT m` to `evalStateT m a = fmap fst $ runStateT m a`, which gives 15 times slowdown. It makes the code both very fragile to any changes as well as makes it unmaintainable. Tracking performance in a very small program (like the attached one) is possible, while tracking it in bigger one, taking in considerating the described problems, make it almost impossible. I'm writing this because I'm worried about where these problems originate from and I would really like to solve them / know why they appear and be sure we can continue to use Haskell for our high- performance demanding applications.
New description: Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results. **The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached.** The code is a pseudo-parser implementation. It consumes any char in a loop and fails on empty input in the end. Everything was compiled with following options (and many variations): `"-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal -fspecialise-aggressively"`. \\ == Helpers Let's define 2 helpers: {{{#!hs (.) :: (b -> c) -> (a -> b) -> a -> c (.) f g = \x -> f (g x) ; {-# INLINE (.) #-} dotl :: (b -> c) -> (a -> b) -> a -> c dotl ~f ~g = \ ~x -> f (g x) ; {-# INLINE dotl #-} }}} So whenever we see `.` in code it is strict in all of its arguments. \\ == Strict StateT performance improvement Let's consider following code: {{{#!hs import qualified Control.Monad.State.Strict as S newtype StateT s m a = StateT { fromStateT :: S.StateT s m a } deriving (Applicative, Functor, Monad, MonadTrans) class MonadState s m | m -> s where get :: m s put :: s -> m () runStateT :: forall s m a. StateT s m a -> s -> m (a, s) evalStateT :: forall s m a. Functor m => StateT s m a -> s -> m a runStateT m s = S.runStateT (fromStateT m) s ; {-# INLINE runStateT #-} evalStateT m = fmap fst . runStateT m ; {-# INLINE evalStateT #-} instance Monad m => MonadState s (StateT s m) where get = StateT S.get ; {-# INLINE get #-} put = StateT . S.put ; {-# INLINE put #-} }}} There are few non-obvious things to note here: 1. This wrapper performs about **15 TIMES better** than `Control.Monad.State.Strict.StateT` (in the provided examples) and if we create a loop in pure code imitating a parser, this `StateT` gets completely optimized away, while the `mtl`'s version does not. 2. If we replace the following functions with lazy composition, we get the same, high performance: {{{#!hs runStateT = S.runStateT `dotl` fromStateT ; {-# INLINE runStateT #-} evalStateT m = fmap fst `dotl` runStateT m ; {-# INLINE evalStateT #-} }}} 3. However, if we slightly change the `evalStateT`, we've got the bad performance, equals to the `mtl`'s `StateT` version (15 times slower): {{{#!hs evalStateT m a = fmap fst $ runStateT m a ; {-# INLINE evalStateT #-} }}} It's a very strange result, especially that `evalStateT` is used only once in the code while running the tests. \\ == Strict Either & EitherT The code contains a very minimalistic implementation of `Either` and `EitherT` in order to make their definitions and utils strict. These definitions are copy-pasted and simplified (removed unused code) from: https://hackage.haskell.org/package/base-4.10.0.0/docs/src/Data.Either.html https://hackage.haskell.org/package/either-4.4.1.1/docs/src/Control.Monad.Tr... \\ == Strict Bool and tuple Moreover we define strict Bool `or` operation and 2-element tuple with strict arguments: {{{#!hs data T a b = T !a !b deriving (Generic, Show, Functor) data XBool = XTrue | XFalse deriving (Show, Generic) (|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} }}} \\ == Parser implementation All the above declarations were simple helpers compiled with `-XStrict`, because available libraries do not provide them for us. This code is a "real" use case and shows the weird performance results. The parser implementation is simple: {{{#!hs newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor) instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined instance Monad m => Monad (FailParser m) where return a = FailParser $ pure $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b a <- ma T !b' a' <- fromFailParser $ f a return $! T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-} instance MonadTrans (FailParser) where lift m = FailParser $! lift $ fmap (T XFalse) m ; {-# INLINE lift #-} }}} We use `undefined` for non-important functions. The parser is `EitherT` wrapper: Left happens when we failed parsing input, while Right otherwise. The `XBool` denotes if we made any progress (so after consuming a letter it is set to `XTrue`). There are some additional util functions, like `returnProgress` which behaves just like return, but also sets the `XBool` value to `XTrue`: {{{#!hs instance Monad m => MonadProgressParser (FailParser m) where returnProgress a = FailParser $! pure (T XFalse a) ; {-# INLINE returnProgress #-} -- In correct code it should be XTrue as described below. }}} In the provided code, there is `XFalse` used instead of `XTrue` because of some interesting observations: 1. The `XBool` value is used ONLY in the expression `return $! T (b ||| b') a'` above, so it does NOT affect the way the program logically executes it's body. 2. Both `b` and `b'` are strict and fully evaluated. 3. If both `b` and `b'` are `XFalse` (as in the provided code, they always are `XFalse`) we get a good performance. In order to test it, the above code contains `XFalse` instead of `XTrue`. 4. If we use the correct version of `returnProgress` as described just before point 1 above, we get 15 times slower performance (the same or very similar to the one when used altered `evalStateT` definition). We could try to explain it: maybe Haskell was able to optimize code if it discovered, there always were only `XFalse` values used and after the change there are both `XTrue` and `XFalse`, so it really has to run the `(|||)` operator. This way of thinking fails as fast as we check that changing `XFalse` to `XTrue` **everywhere** in the code give us again bad performance. 5. If we replace `return $! T (b ||| b') a'` with `return $! T b' a'` we get good performance, while replacing it with `return $! T b a'` give us bad performance. It does not make any sense, because both `b` and `b'` are strict and fully evaluated. Moreover, it is the only place in code where they are used. 6. However replacing `return $! T (b ||| b') a'` with `return $! T (b' ||| b) a'` does NOT change the performance (we are getting the good one). == Final notes We've been talking with some people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. In fact we started to be worried a lot about how we can use Haskell for high-performance parts at all if it is not obvious if a very simple changes do not affect performance so much, like changing `evalStateT m = fmap fst . runStateT m` to `evalStateT m a = fmap fst $ runStateT m a`, which gives 15 times slowdown. It makes the code both very fragile to any changes as well as makes it unmaintainable. Tracking performance in a very small program (like the attached one) is possible, while tracking it in bigger one, taking in considerating the described problems, make it almost impossible. I'm writing this because I'm worried about where these problems originate from and I would really like to solve them / know why they appear and be sure we can continue to use Haskell for our high-performance demanding applications. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results.
**The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached.** The code is a pseudo-parser implementation. It consumes any char in a loop and fails on empty input in the end.
Everything was compiled with following options (and many variations): `"-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal -fspecialise-aggressively"`.
\\
== Helpers
Let's define 2 helpers: {{{#!hs (.) :: (b -> c) -> (a -> b) -> a -> c (.) f g = \x -> f (g x) ; {-# INLINE (.) #-}
dotl :: (b -> c) -> (a -> b) -> a -> c dotl ~f ~g = \ ~x -> f (g x) ; {-# INLINE dotl #-} }}}
So whenever we see `.` in code it is strict in all of its arguments.
\\
== Strict StateT performance improvement
Let's consider following code: {{{#!hs import qualified Control.Monad.State.Strict as S
newtype StateT s m a = StateT { fromStateT :: S.StateT s m a } deriving (Applicative, Functor, Monad, MonadTrans)
class MonadState s m | m -> s where get :: m s put :: s -> m ()
runStateT :: forall s m a. StateT s m a -> s -> m (a, s) evalStateT :: forall s m a. Functor m => StateT s m a -> s -> m a runStateT m s = S.runStateT (fromStateT m) s ; {-# INLINE runStateT #-} evalStateT m = fmap fst . runStateT m ; {-# INLINE evalStateT #-}
instance Monad m => MonadState s (StateT s m) where get = StateT S.get ; {-# INLINE get #-} put = StateT . S.put ; {-# INLINE put #-} }}}
There are few non-obvious things to note here: 1. This wrapper performs about **15 TIMES better** than `Control.Monad.State.Strict.StateT` (in the provided examples) and if we create a loop in pure code imitating a parser, this `StateT` gets completely optimized away, while the `mtl`'s version does not.
2. If we replace the following functions with lazy composition, we get the same, high performance: {{{#!hs runStateT = S.runStateT `dotl` fromStateT ; {-# INLINE runStateT #-} evalStateT m = fmap fst `dotl` runStateT m ; {-# INLINE evalStateT #-} }}}
3. However, if we slightly change the `evalStateT`, we've got the bad performance, equals to the `mtl`'s `StateT` version (15 times slower): {{{#!hs evalStateT m a = fmap fst $ runStateT m a ; {-# INLINE evalStateT #-} }}}
It's a very strange result, especially that `evalStateT` is used only once in the code while running the tests.
\\
== Strict Either & EitherT
The code contains a very minimalistic implementation of `Either` and `EitherT` in order to make their definitions and utils strict. These definitions are copy-pasted and simplified (removed unused code) from: https://hackage.haskell.org/package/base-4.10.0.0/docs/src/Data.Either.html https://hackage.haskell.org/package/either-4.4.1.1/docs/src/Control.Monad.Tr...
\\
== Strict Bool and tuple
Moreover we define strict Bool `or` operation and 2-element tuple with strict arguments:
{{{#!hs data T a b = T !a !b deriving (Generic, Show, Functor)
data XBool = XTrue | XFalse deriving (Show, Generic)
(|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} }}}
\\
== Parser implementation
All the above declarations were simple helpers compiled with `-XStrict`, because available libraries do not provide them for us. This code is a "real" use case and shows the weird performance results.
The parser implementation is simple: {{{#!hs newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor)
instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined
instance Monad m => Monad (FailParser m) where return a = FailParser $ pure $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b a <- ma T !b' a' <- fromFailParser $ f a return $! T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-}
instance MonadTrans (FailParser) where lift m = FailParser $! lift $ fmap (T XFalse) m ; {-# INLINE lift #-} }}}
We use `undefined` for non-important functions. The parser is `EitherT` wrapper: Left happens when we failed parsing input, while Right otherwise. The `XBool` denotes if we made any progress (so after consuming a letter it is set to `XTrue`). There are some additional util functions, like `returnProgress` which behaves just like return, but also sets the `XBool` value to `XTrue`:
{{{#!hs instance Monad m => MonadProgressParser (FailParser m) where returnProgress a = FailParser $! pure (T XFalse a) ; {-# INLINE returnProgress #-} -- In correct code it should be XTrue as described below. }}}
In the provided code, there is `XFalse` used instead of `XTrue` because of some interesting observations:
1. The `XBool` value is used ONLY in the expression `return $! T (b ||| b') a'` above, so it does NOT affect the way the program logically executes it's body.
2. Both `b` and `b'` are strict and fully evaluated.
3. If both `b` and `b'` are `XFalse` (as in the provided code, they always are `XFalse`) we get a good performance. In order to test it, the above code contains `XFalse` instead of `XTrue`.
4. If we use the correct version of `returnProgress` as described just before point 1 above, we get 15 times slower performance (the same or very similar to the one when used altered `evalStateT` definition). We could try to explain it: maybe Haskell was able to optimize code if it discovered, there always were only `XFalse` values used and after the change there are both `XTrue` and `XFalse`, so it really has to run the `(|||)` operator. This way of thinking fails as fast as we check that changing `XFalse` to `XTrue` **everywhere** in the code give us again bad performance.
5. If we replace `return $! T (b ||| b') a'` with `return $! T b' a'` we get good performance, while replacing it with `return $! T b a'` give us bad performance. It does not make any sense, because both `b` and `b'` are strict and fully evaluated. Moreover, it is the only place in code where they are used.
6. However replacing `return $! T (b ||| b') a'` with `return $! T (b' ||| b) a'` does NOT change the performance (we are getting the good one).
== Final notes
We've been talking with some people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. In fact we started to be worried a lot about how we can use Haskell for high-performance parts at all if it is not obvious if a very simple changes do not affect performance so much, like changing `evalStateT m = fmap fst . runStateT m` to `evalStateT m a = fmap fst $ runStateT m a`, which gives 15 times slowdown. It makes the code both very fragile to any changes as well as makes it unmaintainable. Tracking performance in a very small program (like the attached one) is possible, while tracking it in bigger one, taking in considerating the described problems, make it almost impossible. I'm writing this because I'm worried about where these problems originate from and I would really like to solve them / know why they appear and be sure we can continue to use Haskell for our high- performance demanding applications.
New description: Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results. **The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached.** The code is a pseudo-parser implementation. It consumes any char in a loop and fails on empty input in the end. Everything was compiled with following options (and many variations): `"-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal -fspecialise-aggressively"`. \\ == Helpers Let's define 2 helpers: {{{#!hs (.) :: (b -> c) -> (a -> b) -> a -> c (.) f g = \x -> f (g x) ; {-# INLINE (.) #-} dotl :: (b -> c) -> (a -> b) -> a -> c dotl ~f ~g = \ ~x -> f (g x) ; {-# INLINE dotl #-} }}} So whenever we see `.` in code it is strict in all of its arguments. \\ == Strict StateT performance improvement Let's consider following code: {{{#!hs import qualified Control.Monad.State.Strict as S newtype StateT s m a = StateT { fromStateT :: S.StateT s m a } deriving (Applicative, Functor, Monad, MonadTrans) class MonadState s m | m -> s where get :: m s put :: s -> m () runStateT :: forall s m a. StateT s m a -> s -> m (a, s) evalStateT :: forall s m a. Functor m => StateT s m a -> s -> m a runStateT m s = S.runStateT (fromStateT m) s ; {-# INLINE runStateT #-} evalStateT m = fmap fst . runStateT m ; {-# INLINE evalStateT #-} instance Monad m => MonadState s (StateT s m) where get = StateT S.get ; {-# INLINE get #-} put = StateT . S.put ; {-# INLINE put #-} }}} There are few non-obvious things to note here: 1. This wrapper performs about **15 TIMES better** than `Control.Monad.State.Strict.StateT` (in the provided examples) and if we create a loop in pure code imitating a parser, this `StateT` gets completely optimized away, while the `mtl`'s version does not. 2. If we replace the following functions with lazy composition, we get the same, high performance: {{{#!hs runStateT = S.runStateT `dotl` fromStateT ; {-# INLINE runStateT #-} evalStateT m = fmap fst `dotl` runStateT m ; {-# INLINE evalStateT #-} }}} 3. However, if we slightly change the `evalStateT`, we've got the bad performance, equals to the `mtl`'s `StateT` version (15 times slower): {{{#!hs evalStateT m a = fmap fst $ runStateT m a ; {-# INLINE evalStateT #-} }}} It's a very strange result, especially that `evalStateT` is used only once in the code while running the tests. \\ == Strict Either & EitherT The code contains a very minimalistic implementation of `Either` and `EitherT` in order to make their definitions and utils strict. These definitions are copy-pasted and simplified (removed unused code) from: https://hackage.haskell.org/package/base-4.10.0.0/docs/src/Data.Either.html https://hackage.haskell.org/package/either-4.4.1.1/docs/src/Control.Monad.Tr... \\ == Strict Bool and tuple Moreover we define strict Bool `or` operation and 2-element tuple with strict arguments: {{{#!hs data T a b = T !a !b deriving (Generic, Show, Functor) data XBool = XTrue | XFalse deriving (Show, Generic) (|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} }}} \\ == Parser implementation All the above declarations were simple helpers compiled with `-XStrict`, because available libraries do not provide them for us. This code is a "real" use case and shows the weird performance results. The parser implementation is simple: {{{#!hs newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor) instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined instance Monad m => Monad (FailParser m) where return a = FailParser $ pure $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b a <- ma T !b' a' <- fromFailParser $ f a return $! T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-} instance MonadTrans (FailParser) where lift m = FailParser $! lift $ fmap (T XFalse) m ; {-# INLINE lift #-} }}} We use `undefined` for non-important functions. The parser is `EitherT` wrapper: Left happens when we failed parsing input, while Right otherwise. The `XBool` denotes if we made any progress (so after consuming a letter it is set to `XTrue`). There are some additional util functions, like `returnProgress` which behaves just like return, but also sets the `XBool` value to `XTrue`: {{{#!hs instance Monad m => MonadProgressParser (FailParser m) where returnProgress a = FailParser $! pure (T XFalse a) ; {-# INLINE returnProgress #-} -- In correct code it should be XTrue as described below. }}} In the provided code, there is `XFalse` used instead of `XTrue` because of some interesting observations: 1. The `XBool` value is used ONLY in the expression `return $! T (b ||| b') a'` above, so it does NOT affect the way the program logically executes it's body. 2. Both `b` and `b'` are strict and fully evaluated. 3. If both `b` and `b'` are `XFalse` (as in the provided code, they always are `XFalse`) we get a good performance. In order to test it, the above code contains `XFalse` instead of `XTrue`. 4. If we use the correct version of `returnProgress` as described just before point 1 above, we get 15 times slower performance (the same or very similar to the one when used altered `evalStateT` definition). We could try to explain it: maybe Haskell was able to optimize code if it discovered, there always were only `XFalse` values used and after the change there are both `XTrue` and `XFalse`, so it really has to run the `(|||)` operator. This way of thinking fails as fast as we check that changing `XFalse` to `XTrue` **everywhere** in the code give us again bad performance. 5. If we replace `return $! T (b ||| b') a'` with `return $! T b' a'` we get good performance, while replacing it with `return $! T b a'` give us bad performance. It does not make any sense, because both `b` and `b'` are strict and fully evaluated. Moreover, it is the only place in code where they are used. 6. However replacing `return $! T (b ||| b') a'` with `return $! T (b' ||| b) a'` does NOT change the performance (we are getting the good one). == Final notes We've been talking with some people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. In fact we started to be worried a lot about how we can use Haskell for high-performance parts at all, if it is not obvious if a very simple change do not affect performance a lot. A good example is changing `evalStateT m = fmap fst . runStateT m` to `evalStateT m a = fmap fst $ runStateT m a`, which gives 15 times slowdown. This situation makes the source code both very fragile to any changes as well as makes it unmaintainable. Tracking performance in a very small program (like the attached one) is possible, while tracking it in bigger one, taking in consideration the described problems, make it almost impossible. I'm writing this because I'm worried about where these problems originate from and I would really like to solve them / know why they appear, and be sure we can continue to use Haskell for our high- performance demanding applications. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results.
**The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached.** The code is a pseudo-parser implementation. It consumes any char in a loop and fails on empty input in the end.
Everything was compiled with following options (and many variations): `"-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick- factor=1000 -flate-dmd-anal -fspecialise-aggressively"`.
\\
== Helpers
Let's define 2 helpers: {{{#!hs (.) :: (b -> c) -> (a -> b) -> a -> c (.) f g = \x -> f (g x) ; {-# INLINE (.) #-}
dotl :: (b -> c) -> (a -> b) -> a -> c dotl ~f ~g = \ ~x -> f (g x) ; {-# INLINE dotl #-} }}}
So whenever we see `.` in code it is strict in all of its arguments.
\\
== Strict StateT performance improvement
Let's consider following code: {{{#!hs import qualified Control.Monad.State.Strict as S
newtype StateT s m a = StateT { fromStateT :: S.StateT s m a } deriving (Applicative, Functor, Monad, MonadTrans)
class MonadState s m | m -> s where get :: m s put :: s -> m ()
runStateT :: forall s m a. StateT s m a -> s -> m (a, s) evalStateT :: forall s m a. Functor m => StateT s m a -> s -> m a runStateT m s = S.runStateT (fromStateT m) s ; {-# INLINE runStateT #-} evalStateT m = fmap fst . runStateT m ; {-# INLINE evalStateT #-}
instance Monad m => MonadState s (StateT s m) where get = StateT S.get ; {-# INLINE get #-} put = StateT . S.put ; {-# INLINE put #-} }}}
There are few non-obvious things to note here: 1. This wrapper performs about **15 TIMES better** than `Control.Monad.State.Strict.StateT` (in the provided examples) and if we create a loop in pure code imitating a parser, this `StateT` gets completely optimized away, while the `mtl`'s version does not.
2. If we replace the following functions with lazy composition, we get the same, high performance: {{{#!hs runStateT = S.runStateT `dotl` fromStateT ; {-# INLINE runStateT #-} evalStateT m = fmap fst `dotl` runStateT m ; {-# INLINE evalStateT #-} }}}
3. However, if we slightly change the `evalStateT`, we've got the bad performance, equals to the `mtl`'s `StateT` version (15 times slower): {{{#!hs evalStateT m a = fmap fst $ runStateT m a ; {-# INLINE evalStateT #-} }}}
It's a very strange result, especially that `evalStateT` is used only once in the code while running the tests.
\\
== Strict Either & EitherT
The code contains a very minimalistic implementation of `Either` and `EitherT` in order to make their definitions and utils strict. These definitions are copy-pasted and simplified (removed unused code) from: https://hackage.haskell.org/package/base-4.10.0.0/docs/src/Data.Either.html https://hackage.haskell.org/package/either-4.4.1.1/docs/src/Control.Monad.Tr...
\\
== Strict Bool and tuple
Moreover we define strict Bool `or` operation and 2-element tuple with strict arguments:
{{{#!hs data T a b = T !a !b deriving (Generic, Show, Functor)
data XBool = XTrue | XFalse deriving (Show, Generic)
(|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} }}}
\\
== Parser implementation
All the above declarations were simple helpers compiled with `-XStrict`, because available libraries do not provide them for us. This code is a "real" use case and shows the weird performance results.
The parser implementation is simple: {{{#!hs newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor)
instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined
instance Monad m => Monad (FailParser m) where return a = FailParser $ pure $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b a <- ma T !b' a' <- fromFailParser $ f a return $! T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-}
instance MonadTrans (FailParser) where lift m = FailParser $! lift $ fmap (T XFalse) m ; {-# INLINE lift #-} }}}
We use `undefined` for non-important functions. The parser is `EitherT` wrapper: Left happens when we failed parsing input, while Right otherwise. The `XBool` denotes if we made any progress (so after consuming a letter it is set to `XTrue`). There are some additional util functions, like `returnProgress` which behaves just like return, but also sets the `XBool` value to `XTrue`:
{{{#!hs instance Monad m => MonadProgressParser (FailParser m) where returnProgress a = FailParser $! pure (T XFalse a) ; {-# INLINE returnProgress #-} -- In correct code it should be XTrue as described below. }}}
In the provided code, there is `XFalse` used instead of `XTrue` because of some interesting observations:
1. The `XBool` value is used ONLY in the expression `return $! T (b ||| b') a'` above, so it does NOT affect the way the program logically executes it's body.
2. Both `b` and `b'` are strict and fully evaluated.
3. If both `b` and `b'` are `XFalse` (as in the provided code, they always are `XFalse`) we get a good performance. In order to test it, the above code contains `XFalse` instead of `XTrue`.
4. If we use the correct version of `returnProgress` as described just before point 1 above, we get 15 times slower performance (the same or very similar to the one when used altered `evalStateT` definition). We could try to explain it: maybe Haskell was able to optimize code if it discovered, there always were only `XFalse` values used and after the change there are both `XTrue` and `XFalse`, so it really has to run the `(|||)` operator. This way of thinking fails as fast as we check that changing `XFalse` to `XTrue` **everywhere** in the code give us again bad performance.
5. If we replace `return $! T (b ||| b') a'` with `return $! T b' a'` we get good performance, while replacing it with `return $! T b a'` give us bad performance. It does not make any sense, because both `b` and `b'` are strict and fully evaluated. Moreover, it is the only place in code where they are used.
6. However replacing `return $! T (b ||| b') a'` with `return $! T (b' ||| b) a'` does NOT change the performance (we are getting the good one).
== Final notes
We've been talking with some people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. In fact we started to be worried a lot about how we can use Haskell for high-performance parts at all, if it is not obvious if a very simple change do not affect performance a lot. A good example is changing `evalStateT m = fmap fst . runStateT m` to `evalStateT m a = fmap fst $ runStateT m a`, which gives 15 times slowdown. This situation makes the source code both very fragile to any changes as well as makes it unmaintainable. Tracking performance in a very small program (like the attached one) is possible, while tracking it in bigger one, taking in consideration the described problems, make it almost impossible. I'm writing this because I'm worried about where these problems originate from and I would really like to solve them / know why they appear, and be sure we can continue to use Haskell for our high-performance demanding applications.
New description: Hi! I was recently testing performance of a critical code in a product we are shipping and I'm getting really weird results. **The code is compiled with `-XStrict` enabled globally. The full source code for this ticket is attached, while the exposed code below uses `...` to hide some non-important implementations.** To get desired results, we use following GHC flags: `-O2 -funfolding-use- threshold=10000`. Let's consider the following program. It is just a pseudo-parser implementation. It consumes 'a' chars in a loop and fails on empty input in the end: {{{#!hs -- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -- | WARNING: -XStrict enabled in this file !!! -- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! module Main where imports ... (full source attached to this ticket) ------------------------ -- === Primitives === -- ------------------------ -- === Strict Either === -- data Either e a = Left e | Right a deriving (Eq, Generic, Ord, Read, Show, Functor) newtype EitherT e m a = EitherT { runEitherT :: m (Either e a) } instance Monad m => Functor (EitherT e m) where ... instance Monad m => Applicative (EitherT e m) where ... instance Monad m => Monad (EitherT e m) where ... -- === Strict Bool === -- data XBool = XTrue | XFalse deriving (Show, Generic) (|||) :: XBool -> XBool -> XBool (|||) !a !b = case a of XTrue -> a XFalse -> b {-# INLINE (|||) #-} -- === Strict Tuple === -- data T a b = T !a !b deriving (Generic, Show, Functor) ------------------------ -- === FailParser === -- ------------------------ -- === Definition === -- -- | It is just like EitherT, but also contains progress indicator - a field of type XBool -- which tells us if we've already parsed a char or not yet. In this snippet code however, -- it does not do anything valuable - it just stores the value. newtype FailParser m a = FailParser { fromFailParser :: EitherT () m (T XBool a) } deriving (Functor) instance Monad m => Applicative (FailParser m) where pure = undefined (<*>) = undefined instance Monad m => Monad (FailParser m) where return a = FailParser $ return $ (T XFalse a) ; {-# INLINE return #-} FailParser ma >>= f = FailParser $ do T !b !a <- ma T !b' !a' <- fromFailParser $ f a return $ T (b ||| b') a' {-# INLINE (>>=) #-} _ >> _ = undefined ; {-# INLINE (>>) #-} -- === Running === -- failParser :: m (Either () (T XBool a)) -> FailParser m a failParser a = FailParser $ EitherT a ; {-# INLINE failParser #-} runFailParser :: forall m a. FailParser m a -> m (Either () (T XBool a)) runFailParser f = runEitherT $ fromFailParser f ; {-# INLINE runFailParser #-} -- === MonadFailedParser === -- -- | Behaves just like "left" - lifts until it hits MonadFailedParser class Monad m => MonadFailedParser m where failed :: m a instance {-# OVERLAPPABLE #-} (MonadFailedParser m, MonadTrans t, Monad (t m)) => MonadFailedParser (t m) where failed = lift failed ; {-# INLINE failed #-} instance Monad m => MonadFailedParser (FailParser m) where failed = failParser $ return $ Left () ; {-# INLINE failed #-} ----------------------- -- === Main loop === -- ----------------------- parserLoop :: StateT Text (FailParser Identity) Bool parserLoop = parserStep >> parserLoop parserStep :: StateT Text (FailParser Identity) Char parserStep = get >>= \s -> case Text.uncons s of Just (!t, !s') -> if t == 'a' then put s' >> return t else failed Nothing -> failed {-# INLINE parserStep #-} -- === Criterion === -- instance NFData XBool instance (NFData l, NFData r) => NFData (Either l r) instance (NFData a, NFData b) => NFData (T a b) genText :: Int -> Text genText i = fromString $ replicate i 'a' ; {-# INLINE genText #-} a_parsing_main :: IO () a_parsing_main = do defaultMain [ env (return $ genText $ 10^6) $ bench "a*" . nf (runIdentity . runFailParser . evalStateT parserLoop) ] main = a_parsing_main }}} The most important part is the `bind` implementation of `FailParser`: {{{#!hs FailParser ma >>= f = FailParser $ do T b a <- ma T b' a' <- fromFailParser $ f a return $ T (b ||| b') a' }}} There are several performance related observations and problems: 1. **INFO:** Everything is compiled with `-XStrict` and every field in this code is fully evaluated, in particular `b` and `b'` are fully evaluated, strict values of type `XBool`. 2. **INFO:** Neither `b` nor `b'` are used anywhere else in the code. They are just fields in `FailParser` which should be used to store information if we did consume a letter or we did not. 3. **PROBLEM:** When provided with `10^6` characters this code works in 1ms. If we replace `(b ||| b')` with `(b' ||| b)` or with `(b')` the time do NOT change. However, if we replace it with `(b)`, we've got **15 times** slowdown. Moreover, the resulting core is changed drastically in some places. 4. **PROBLEM:** Another interesting observation is that the value of `XBool` is created only in one place in the code, namely in: `FailParser`'s `Monad` implementation, in `return` function: `return a = FailParser $ return $ (T XFalse a) ; {-# INLINE return #-}`. We never change the XFalse, so this is the only value that could appear in this code. If we change it to `XTrue` in this implementation however, we again get **15 times** slowdown. 5. **INFO:** The order of `case` expressions in definition of `(|||)` or the order of constructor defintions of any datatype does not affect the above results. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by danilo2): * Attachment "Main.hs" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): I've just shortened the example code and made the performance related questions simpler. I think it would be easier now debug what is going on under the hood. I would be very thankful for any information regarding this issue. We've been talking with many people - both in the company I'm working in as well as on IRC and we do not see any reason why this code behaves in this way and why it is so sensitive to the changes. We started to be worried a lot about how we can use Haskell for high-performance parts at all, if it is not obvious if a very simple change do not affect performance a lot. This situation makes the source code both very fragile to any changes and unmaintainable as a result. I'm writing this because I'm deeply worried about where these problems originate from and I would really like to solve them / know why they appear. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
When provided with 10^6 characters this code works in 1ms
Can you provide test data. How do I run the program? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I have made some progress. (1), I discovered that `-XStrict` was generating some stunningly bad desugarings for very ordinary function bindings. I have a fix in the works. This seems to be responsible for almost all the performance loss. (2), look at your code {{{ FailParser ma >>= f = FailParser $ do T b a <- ma T b' a' <- fromFailParser $ f a return $ T (b ||| b') a' }}} If `b` turns out to be `XFalse`, this amounts to {{{ FailParser ma >>= f = FailParser $ do T b a <- ma T b' a' <- fromFailParser $ f a return $ T b' a' }}} and GHC can re-use the (T b' a') that `fromFailParser` returned. Moreover in the critical inner loop `b` is indeed `XFalse`: {{{ parserLoop = parserStep >> parserLoop }}} because `return` returns `XFalse`. But if you change the `(>==)` to {{{ return $ T b a' }}} now GHC can't re-use anthing, and so allocate a fresh `T` every time round the loop. So an apparently simpler program is actually more complicated! But (2) is not a massive effect. The big thing is the desugaring. Stay tuned. Meanwhile, try without `-XStrict`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): Simon, first of all, thank you very much for your time and help with this topic! I added some important notices to the points mentioned in your response: **(1)** I'm so happy that you've found out that something is wrong and you've got fix for that! In generall, `-XStrict` is awesome, we need it in high performance Haskell code, putting bangs everywhere (and remembering about it) could be cumbersome. **(2)** You're of course right. I just opened the browser to add comment exactly about the same finding. The specification of `(|||)` allows GHC to easily discover that if we always use `XFalse` value, it could shorten the mentioned code to `s@(T b' a') <- fromFailParser $ f a ; return s` (just reuse the value). There are however 3 other non-obvious questions involved: **(2a)** Why GHC is able to optimize the code this way if we use everywhere `-XFalse` but it does not when using everywhere `-XTrue`? Very similar final core could be generated in the later case – if `b` is `XFalse` we can just reuse the output value, if it is `XTrue` we can be sure the output always contains `XTrue` as well. **(2b)** Even if GHC needs to create code like `T b' a' <- fromFailParser $ f a ; return $ T something a'`, why it takes so long? This is a strict, fully evaluated value, so why "updating a field" takes 10x longer than Char comparison? **(2c)** Moreover, what is the reason to "allocate a fresh `T` every time round the loop"? The fields of the tuple `T` do not "interact" with each other, they are just 2 separate outputs from a function. I could of course be very wrong, but I think it should be possible to just optimize `T a b` to `(# a,b #)` and cut the "fresh `T` allocation time" completely out, am I right? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): One more important thing to note here is that the provided code was shortened to the limits. It does not use the `XBool` value in any place (it puts `-XFalse` everywhere, even after successful parse). It implies that the problem (2a) is also not very important - it is just an optimization opportunity in a very special and rare use case. We can easily fix the code and make it a real use case by inserting the following code: {{{#!hs class Monad m => ProgressMonad m where returnProgressed :: forall a. a -> m a instance {-# OVERLAPPABLE #-} (ProgressMonad m, Monad (t m), MonadTrans t) => ProgressMonad (t m) where returnProgressed = lift . returnProgressed ; {-# INLINE returnProgressed #-} instance Monad m => ProgressMonad (FailParser m) where returnProgressed a = failParser $ return $ Right $ T XTrue a ; {-# INLINE returnProgressed #-} }}} and replacing the line 125 to: {{{#!hs Just (!t, !s') -> if t == 'a' then put s' >> returnProgressed t else failed }}} The `XBool` value would then be used to implement `Alternative` instance, but we do not need it here. We can observe the same slowdown (`10^6` chars parsed in 15ms with `-XStrict` enabled). Which is expected, based on the results so far, however if we want to base on a real use case, this code help us transform abstract program to real one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Let's do one thing at a time. My brain is too small to accommodate all these variations. I'll commit my `-XStrict` fix. You can try it out. If you are happy, close the ticket; if not, can you give a new repro case? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14035: Weird performance results.
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones
participants (1)
-
GHC