[GHC] #14936: GHC 8.4 performance regressions when using newtypes

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.4.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Here is some serious performance regression in the following code: {{{ {-# LANGUAGE BangPatterns #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE FlexibleContexts #-} module Main where import Prelude import qualified Foreign.Storable as Storable import qualified Control.Monad.State.Strict as S import Control.Monad.IO.Class import Foreign.Marshal.Alloc (mallocBytes) import Criterion.Main newtype Foo a = Foo a intSize :: Int intSize = Storable.sizeOf (undefined :: Int) slow :: Int -> IO () slow i = do ptr <- mallocBytes $ 2 * intSize Storable.pokeByteOff ptr intSize (0 :: Int) let go 0 = pure () go j = do Foo (!_, !off) <- S.get !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off liftIO $ Storable.pokeByteOff ptr off $! (x + 1) go (j - 1) S.evalStateT (go i) (Foo ((0::Int),(intSize::Int))) fast :: Int -> IO () fast i = do ptr <- mallocBytes $ 2 * intSize Storable.pokeByteOff ptr intSize (0 :: Int) let go 0 = pure () go j = do (!_, !off) <- S.get !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off liftIO $ Storable.pokeByteOff ptr off $! (x + 1) go (j - 1) S.evalStateT (go i) ((0::Int),(intSize::Int)) main :: IO () main = defaultMain [ bgroup "slow" $ (\(i :: Int) -> bench ("10e" <> show i) $ perRunEnv (return ()) $ \v -> slow (10 ^ i)) <$> [7..8] , bgroup "fast" $ (\(i :: Int) -> bench ("10e" <> show i) $ perRunEnv (return ()) $ \v -> fast (10 ^ i)) <$> [7..8] ] }}} Compiled with flags: `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -fexcess-precision -fexpose-all-unfoldings -flate-dmd-anal -fspec-constr- keen -fspecialise-aggressively -fstatic-argument-transformation -fmax- worker-args=200` The `slow` function executes 2 times slower than the `fast` one. The only difference is that the state is wrapped in a newtype. It was working properly in GHC 8.2 (both functions were equally fast - as fast as the current `fast` function). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by danilo2): * priority: high => highest -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): Here is a slighter smaller example to demonstrate the issue: {{{#!hs {-# LANGUAGE BangPatterns #-} module Bug2 where import Control.Monad.Trans.State.Strict newtype Foo a = Foo a slowGo :: Int -> StateT (Foo (Int, Int)) IO () slowGo 0 = pure () slowGo j = do Foo (!_, !off) <- get slowGo (j - 1) fastGo :: Int -> StateT (Int, Int) IO () fastGo 0 = pure () fastGo j = do (!_, !off) <- get fastGo (j - 1) }}} In GHC 8.2.2, if you compare the Core between these two functions (in the `_$s_$w` functions that perform most of the work): {{{ $ /opt/ghc/8.2.2/bin/ghc Bug2.hs -O2 -fforce-recomp -ddump-simpl -dsuppress-idinfo -dsuppress-uniques -dsuppress-module-prefixes -dsuppress-coercions [1 of 1] Compiling Bug2 ( Bug2.hs, Bug2.o ) ==================== Tidy Core ==================== Result size of Tidy Core = {terms: 190, types: 298, coercions: 60, joins: 0/0} ... Rec { -- RHS size: {terms: 24, types: 23, coercions: 0, joins: 0/0} fastGo_$s$wfastGo :: State# RealWorld -> Int# -> Int# -> Int# -> (# State# RealWorld, ((), (Int, Int)) #) fastGo_$s$wfastGo = \ (sc :: State# RealWorld) (sc1 :: Int#) (sc2 :: Int#) (sc3 :: Int#) -> case sc3 of ds { __DEFAULT -> fastGo_$s$wfastGo sc sc1 sc2 (-# ds 1#); 0# -> (# sc, ((), (I# sc1, I# sc2)) #) } end Rec } ... Rec { -- RHS size: {terms: 25, types: 37, coercions: 6, joins: 0/0} slowGo_$s$wslowGo :: State# RealWorld -> Int# -> Int# -> ((Int, Int) :: *) ~R# (Foo (Int, Int) :: *) => Int# -> (# State# RealWorld, ((), Foo (Int, Int)) #) slowGo_$s$wslowGo = \ (sc :: State# RealWorld) (sc1 :: Int#) (sc2 :: Int#) (sg :: ((Int, Int) :: *) ~R# (Foo (Int, Int) :: *)) (sc3 :: Int#) -> case sc3 of ds { __DEFAULT -> slowGo_$s$wslowGo sc sc1 sc2 @~ Co:5 (-# ds 1#); 0# -> (# sc, ((), (I# sc1, I# sc2) `cast` Co:1) #) } end Rec } }}} Then they are essentially identical (the `slowGo` one has an extra argument of type `((Int, Int) :: *) ~R# (Foo (Int, Int) :: *)`, but that is zero-width, so it shouldn't have any effect at runtime). On the other hand, in GHC 8.4.1: {{{ $ ~/Software/ghc-8.4.1/bin/ghc Bug2.hs -O2 -fforce-recomp -ddump-simpl -dsuppress-idinfo -dsuppress-uniques -dsuppress-module-prefixes -dsuppress-coercions [1 of 1] Compiling Bug2 ( Bug2.hs, Bug2.o ) ==================== Tidy Core ==================== Result size of Tidy Core = {terms: 163, types: 231, coercions: 54, joins: 0/0} ... Rec { -- RHS size: {terms: 24, types: 23, coercions: 0, joins: 0/0} fastGo_$s$wfastGo :: State# RealWorld -> Int# -> Int# -> Int# -> (# State# RealWorld, ((), (Int, Int)) #) fastGo_$s$wfastGo = \ (sc :: State# RealWorld) (sc1 :: Int#) (sc2 :: Int#) (sc3 :: Int#) -> case sc3 of ds { __DEFAULT -> fastGo_$s$wfastGo sc sc1 sc2 (-# ds 1#); 0# -> (# sc, ((), (I# sc1, I# sc2)) #) } end Rec } ... Rec { -- RHS size: {terms: 27, types: 34, coercions: 9, joins: 0/0} $wslowGo :: Int# -> Foo (Int, Int) -> State# RealWorld -> (# State# RealWorld, ((), Foo (Int, Int)) #) $wslowGo = \ (ww :: Int#) (w :: Foo (Int, Int)) (w1 :: State# RealWorld) -> case ww of ds { __DEFAULT -> case w `cast` Co:4 of wild { (ds1, off) -> case ds1 of { I# ipv -> case off of { I# ipv1 -> $wslowGo (-# ds 1#) (wild `cast` Co:5) w1 } } }; 0# -> (# w1, ((), w) #) } end Rec } }}} This time, `slowGo` doesn't have something akin to `slowGo_$s$wslowGo`. Instead, it performs the body of the loop in `$wslowGo`, which uses `Foo (Int, Int)` instead of two unboxed `Int#` arguments. I could imagine that this alone contributes to the slowdown. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sjakobi): * cc: sjakobi (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * cc: simonpj (added) * milestone: => 8.4.2 Comment: This regression was introduced in commit fb050a330ad202c1eb43038dc18cca2a5be26f4a (`Do not bind coercion variables in SpecConstr rules`). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * keywords: => SpecConstr -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Here is some serious performance regression in the following code:
{{{
{-# LANGUAGE BangPatterns #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE FlexibleContexts #-}
module Main where
import Prelude import qualified Foreign.Storable as Storable import qualified Control.Monad.State.Strict as S import Control.Monad.IO.Class import Foreign.Marshal.Alloc (mallocBytes) import Criterion.Main
newtype Foo a = Foo a
intSize :: Int intSize = Storable.sizeOf (undefined :: Int)
slow :: Int -> IO () slow i = do ptr <- mallocBytes $ 2 * intSize Storable.pokeByteOff ptr intSize (0 :: Int) let go 0 = pure () go j = do Foo (!_, !off) <- S.get !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off liftIO $ Storable.pokeByteOff ptr off $! (x + 1) go (j - 1) S.evalStateT (go i) (Foo ((0::Int),(intSize::Int)))
fast :: Int -> IO () fast i = do ptr <- mallocBytes $ 2 * intSize Storable.pokeByteOff ptr intSize (0 :: Int) let go 0 = pure () go j = do (!_, !off) <- S.get !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off liftIO $ Storable.pokeByteOff ptr off $! (x + 1) go (j - 1) S.evalStateT (go i) ((0::Int),(intSize::Int))
main :: IO () main = defaultMain [ bgroup "slow" $ (\(i :: Int) -> bench ("10e" <> show i) $ perRunEnv (return ()) $ \v -> slow (10 ^ i)) <$> [7..8]
, bgroup "fast" $ (\(i :: Int) -> bench ("10e" <> show i) $ perRunEnv (return ()) $ \v -> fast (10 ^ i)) <$> [7..8] ] }}}
Compiled with flags: `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -fexcess-precision -fexpose-all-unfoldings -flate-dmd-anal -fspec-constr- keen -fspecialise-aggressively -fstatic-argument-transformation -fmax- worker-args=200`
The `slow` function executes 2 times slower than the `fast` one. The only difference is that the state is wrapped in a newtype. It was working properly in GHC 8.2 (both functions were equally fast - as fast as the current `fast` function).
New description: Here is some serious performance regression in the following code: {{{#!hs {-# LANGUAGE BangPatterns #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE FlexibleContexts #-} module Main where import Prelude import qualified Foreign.Storable as Storable import qualified Control.Monad.State.Strict as S import Control.Monad.IO.Class import Foreign.Marshal.Alloc (mallocBytes) import Criterion.Main newtype Foo a = Foo a intSize :: Int intSize = Storable.sizeOf (undefined :: Int) slow :: Int -> IO () slow i = do ptr <- mallocBytes $ 2 * intSize Storable.pokeByteOff ptr intSize (0 :: Int) let go 0 = pure () go j = do Foo (!_, !off) <- S.get !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off liftIO $ Storable.pokeByteOff ptr off $! (x + 1) go (j - 1) S.evalStateT (go i) (Foo ((0::Int),(intSize::Int))) fast :: Int -> IO () fast i = do ptr <- mallocBytes $ 2 * intSize Storable.pokeByteOff ptr intSize (0 :: Int) let go 0 = pure () go j = do (!_, !off) <- S.get !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off liftIO $ Storable.pokeByteOff ptr off $! (x + 1) go (j - 1) S.evalStateT (go i) ((0::Int),(intSize::Int)) main :: IO () main = defaultMain [ bgroup "slow" $ (\(i :: Int) -> bench ("10e" <> show i) $ perRunEnv (return ()) $ \v -> slow (10 ^ i)) <$> [7..8] , bgroup "fast" $ (\(i :: Int) -> bench ("10e" <> show i) $ perRunEnv (return ()) $ \v -> fast (10 ^ i)) <$> [7..8] ] }}} Compiled with flags: `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -fexcess-precision -fexpose-all-unfoldings -flate-dmd-anal -fspec-constr- keen -fspecialise-aggressively -fstatic-argument-transformation -fmax- worker-args=200` The `slow` function executes 2 times slower than the `fast` one. The only difference is that the state is wrapped in a newtype. It was working properly in GHC 8.2 (both functions were equally fast - as fast as the current `fast` function). -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): RyanGlScott, did you happen to see whether the `WARN` mentioned in that patch fired in the case of this program? I suppose it likely did. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): I wasn't using a build with assertions enabled, so I couldn't tell. I'll check shortly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): bgamari, your suspicions have been confirmed: {{{ $ ghc2/inplace/bin/ghc-stage2 -O2 -fforce-recomp -DDEBUG Bug.hs [1 of 1] Compiling Bug2 ( Bug.hs, Bug.o ) WARNING: file compiler/specialise/SpecConstr.hs, line 2078 SpecConstr: bad covars: [sg_s2Ba] $wslowGo_s2A8 (-# ds_X2vv 1#) (wild_Xi `cast` (Sym (N:Foo[0] <(Int, Int)>_R) :: ((Int, Int) :: *) ~R# (Foo (Int, Int) :: *))) w_s2A3 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.4.2
Component: Compiler | Version: 8.4.1
Resolution: | Keywords: SpecConstr
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: closed Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: fixed | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: | perf/should_run/T14936 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * status: new => closed * testcase: => perf/should_run/T14936 * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: merge Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: fixed | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: | perf/should_run/T14936 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * status: closed => merge Comment: I'll optimistically mark this as "merge", since this seems like an important performance win. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14936: GHC 8.4 performance regressions when using newtypes -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: closed Priority: highest | Milestone: 8.4.2 Component: Compiler | Version: 8.4.1 Resolution: fixed | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: | perf/should_run/T14936 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: merge => closed Comment: I was on the fence about this one but ended up merging it since it looks fairly low-risk. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14936#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC