[GHC] #15176: Superclass `Monad m =>` makes program run 100 times slower

#15176: Superclass `Monad m =>` makes program run 100 times slower
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.6.1
Component: Compiler | Version: 8.4.2
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: Compile-time
Unknown/Multiple | performance bug
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
Hi! I've just encountered a very bizarre error.
=== General description ===
The code:
{{{
class LayersFoldableBuilder__ t (layers :: [Type]) m where
buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
instance Monad m => LayersFoldableBuilder__ t '[] m where
buildLayersFold__ = \_ a -> a
{-# INLINE buildLayersFold__ #-}
instance ( MonadIO m
, Storable.Storable (Layer.Cons l ())
, Layer.StorableLayer l m
, LayerFoldableBuilder__ (EnabledLayer t l) t m l
, LayersFoldableBuilder__ t ls m )
=> LayersFoldableBuilder__ t (l ': ls) m where
buildLayersFold__ = \ptr mr -> do
let fs = buildLayersFold__ @t @ls ptr'
ptr' = Ptr.plusPtr ptr $ Layer.byteSize @l
layerBuild__ @(EnabledLayer t l) @t @m @l ptr $! fs mr
{-# INLINE buildLayersFold__ #-}
}}}
This is a typeclass `LayersFoldableBuilder__` and ALL of its instances.
Please note, that every instance has a `Monad m` or `MonadIO m`
constraint. The program which uses this code heavily runs in 40ms. If we
only add constraint `Monad m =>` to the class definition:
{{{
class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where
buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
}}}
The program runs in 3.5s , which is almost 100 times slower.
Unfortunatelly I do not have minimal example, but it is reproducible. It
is a part of the Luna Language codebase: https://github.com/luna/luna-
core/blob/60bf6130691c23e52b97b067b52becb8fdb0c72e/core/src/Data/Graph/Traversal/Scoped.hs#L102
it was introduced in the commit 60bf6130691c23e52b97b067b52becb8fdb0c72e
on branch static-layers . However, building this is simple: stack bench
luna-core . After invoking it we see the described results.
=== Why its important and what should we do to fix it ===
1. I am writing this because I care about Haskell community. I want GHC
and Haskell to be widely used. Right now, the only thing I hear from all
companies around our company is that impredicative performance, even when
following rules "how to write efficient code" is the biggest pain people
have. Haskell is gathering attention - pure functional programming,
immutability etc - are great. But it will not become a popular choice
unless we care about predictive performance.
2. Such performance changes are unacceptable when thinking about Haskell
and GHC as production ready systems. We need a clear way how to write high
performance Haskell without the need to benchmark every part of our
programs even when refactoring things. GHC has enough information to
discover that we want high performance here and there (even by looking at
INLINE pragmas) and should warn us about lack of optimization. We should
also have a way to force GHC to apply optimizations in particular places -
for example by explicit marking code to be always specialized during
compilation, so GHC would never fall back to dict-passing in such places.
Such possibility would solve MANY related problems and user fears.
3. The point 2 also applies to strictness. In my opinion, having more
clear strictness resolution rules / tools is important. Right now the only
way to know if strictness analysis did a good job and we are not
constantly boxing / unboxing things is to read core, which is tedious and
99% of Haskell users do not even know how to do it (We've got 10 really,
really good Haskellers here, 2 of them are capable of reading core, but
not very fluently). I would love to chat more about these topics, because
they are crucial for growing Haskell community and making Haskell more
popular choice, which is waht we want, right? We don't want Haskell to be
just a research project with "all its users being its authors" at the same
time, am I
=== What happens in core ===
I inspected core and have found that indeed, after adding the constraint,
GHC does not apply all optimizations to the definitions. To be honest, I
completely don't understand it, because the code uses everywhere explicit
`INLINE` pragma to be sure everything is optimized away in the compilation
stage:
{{{
--------------------------------------------------------------------------------
SLOW, without Monad m =>
--------------------------------------------------------------------------------
-- RHS size: {terms: 5, types: 12, coercions: 4, joins: 0/0}
buildLayersFold__ [InlPrag=INLINE]
:: forall t (layers :: [*]) (m :: * -> *).
LayersFoldableBuilder__ t layers m =>
SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
[GblId[ClassOp],
Arity=1,
Caf=NoCafRefs,
Str=,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=False,boring_ok=True)
Tmpl= \ (@ t_ao0O)
(@ (layers_ao0P :: [*]))
(@ (m_ao0Q :: * -> *))
(v_B1 [Occ=Once]
:: LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q)
->
v_B1
`cast`
(Data.Graph.Traversal.Scoped.N:LayersFoldableBuilder__[0]
,
RULES: Built in rule for
Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__: "Class op
$p1LayersFoldableBuilder__"]
Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__
= \ (@ t_ao0P)
(@ (layers_ao0Q :: [*]))
(@ (m_ao0R :: * -> *))
(v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) ->
case v_B1 of v_B1
{ Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2
v_B3 ->
v_B2
}
-- RHS size: {terms: 8, types: 25, coercions: 0, joins: 0/0}
buildLayersFold__
:: forall t (layers :: [*]) (m :: * -> *).
LayersFoldableBuilder__ t layers m =>
SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
[GblId[ClassOp],
Arity=1,
Caf=NoCafRefs,
Str=,
RULES: Built in rule for buildLayersFold__: "Class op buildLayersFold__"]
buildLayersFold__
= \ (@ t_ao0P)
(@ (layers_ao0Q :: [*]))
(@ (m_ao0R :: * -> *))
(v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) ->
case v_B1 of v_B1
{ Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2
v_B3 ->
v_B3
}
}}}
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by danilo2: Old description:
Hi! I've just encountered a very bizarre error.
=== General description ===
The code:
{{{ class LayersFoldableBuilder__ t (layers :: [Type]) m where buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
instance Monad m => LayersFoldableBuilder__ t '[] m where buildLayersFold__ = \_ a -> a {-# INLINE buildLayersFold__ #-}
instance ( MonadIO m , Storable.Storable (Layer.Cons l ()) , Layer.StorableLayer l m , LayerFoldableBuilder__ (EnabledLayer t l) t m l , LayersFoldableBuilder__ t ls m ) => LayersFoldableBuilder__ t (l ': ls) m where buildLayersFold__ = \ptr mr -> do let fs = buildLayersFold__ @t @ls ptr' ptr' = Ptr.plusPtr ptr $ Layer.byteSize @l layerBuild__ @(EnabledLayer t l) @t @m @l ptr $! fs mr {-# INLINE buildLayersFold__ #-} }}}
This is a typeclass `LayersFoldableBuilder__` and ALL of its instances. Please note, that every instance has a `Monad m` or `MonadIO m` constraint. The program which uses this code heavily runs in 40ms. If we only add constraint `Monad m =>` to the class definition:
{{{ class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t) }}}
The program runs in 3.5s , which is almost 100 times slower.
Unfortunatelly I do not have minimal example, but it is reproducible. It is a part of the Luna Language codebase: https://github.com/luna/luna- core/blob/60bf6130691c23e52b97b067b52becb8fdb0c72e/core/src/Data/Graph/Traversal/Scoped.hs#L102
it was introduced in the commit 60bf6130691c23e52b97b067b52becb8fdb0c72e on branch static-layers . However, building this is simple: stack bench luna-core . After invoking it we see the described results.
=== Why its important and what should we do to fix it ===
1. I am writing this because I care about Haskell community. I want GHC and Haskell to be widely used. Right now, the only thing I hear from all companies around our company is that impredicative performance, even when following rules "how to write efficient code" is the biggest pain people have. Haskell is gathering attention - pure functional programming, immutability etc - are great. But it will not become a popular choice unless we care about predictive performance.
2. Such performance changes are unacceptable when thinking about Haskell and GHC as production ready systems. We need a clear way how to write high performance Haskell without the need to benchmark every part of our programs even when refactoring things. GHC has enough information to discover that we want high performance here and there (even by looking at INLINE pragmas) and should warn us about lack of optimization. We should also have a way to force GHC to apply optimizations in particular places - for example by explicit marking code to be always specialized during compilation, so GHC would never fall back to dict-passing in such places. Such possibility would solve MANY related problems and user fears.
3. The point 2 also applies to strictness. In my opinion, having more clear strictness resolution rules / tools is important. Right now the only way to know if strictness analysis did a good job and we are not constantly boxing / unboxing things is to read core, which is tedious and 99% of Haskell users do not even know how to do it (We've got 10 really, really good Haskellers here, 2 of them are capable of reading core, but not very fluently). I would love to chat more about these topics, because they are crucial for growing Haskell community and making Haskell more popular choice, which is waht we want, right? We don't want Haskell to be just a research project with "all its users being its authors" at the same time, am I
=== What happens in core ===
I inspected core and have found that indeed, after adding the constraint, GHC does not apply all optimizations to the definitions. To be honest, I completely don't understand it, because the code uses everywhere explicit `INLINE` pragma to be sure everything is optimized away in the compilation stage:
{{{ -------------------------------------------------------------------------------- SLOW, without Monad m => --------------------------------------------------------------------------------
-- RHS size: {terms: 5, types: 12, coercions: 4, joins: 0/0} buildLayersFold__ [InlPrag=INLINE] :: forall t (layers :: [*]) (m :: * -> *). LayersFoldableBuilder__ t layers m => SomePtr -> m (Fold.Result t) -> m (Fold.Result t) [GblId[ClassOp], Arity=1, Caf=NoCafRefs, Str=
, Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=1,unsat_ok=False,boring_ok=True) Tmpl= \ (@ t_ao0O) (@ (layers_ao0P :: [*])) (@ (m_ao0Q :: * -> *)) (v_B1 [Occ=Once] :: LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q) -> v_B1 `cast` (Data.Graph.Traversal.Scoped.N:LayersFoldableBuilder__[0]_N _N _N :: (LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q :: Constraint) ~R# (SomePtr -> m_ao0Q (Fold.Result t_ao0O) -> m_ao0Q (Fold.Result t_ao0O) :: *))}] buildLayersFold__ = \ (@ t_ao0O) (@ (layers_ao0P :: [*])) (@ (m_ao0Q :: * -> *)) (v_B1 :: LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q) -> v_B1 `cast` (Data.Graph.Traversal.Scoped.N:LayersFoldableBuilder__[0] _N _N _N :: (LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q :: Constraint) ~R# (SomePtr -> m_ao0Q (Fold.Result t_ao0O) -> m_ao0Q (Fold.Result t_ao0O) :: *))
-------------------------------------------------------------------------------- FAST, without Monad m => --------------------------------------------------------------------------------
-- RHS size: {terms: 8, types: 25, coercions: 0, joins: 0/0} Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__ :: forall t (layers :: [*]) (m :: * -> *). LayersFoldableBuilder__ t layers m => Monad m [GblId[ClassOp], Arity=1, Caf=NoCafRefs, Str=
, RULES: Built in rule for Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__: "Class op $p1LayersFoldableBuilder__"] Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__ = \ (@ t_ao0P) (@ (layers_ao0Q :: [*])) (@ (m_ao0R :: * -> *)) (v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) -> case v_B1 of v_B1 { Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2 v_B3 -> v_B2 }-- RHS size: {terms: 8, types: 25, coercions: 0, joins: 0/0} buildLayersFold__ :: forall t (layers :: [*]) (m :: * -> *). LayersFoldableBuilder__ t layers m => SomePtr -> m (Fold.Result t) -> m (Fold.Result t) [GblId[ClassOp], Arity=1, Caf=NoCafRefs, Str=
, RULES: Built in rule for buildLayersFold__: "Class op buildLayersFold__"] buildLayersFold__ = \ (@ t_ao0P) (@ (layers_ao0Q :: [*])) (@ (m_ao0R :: * -> *)) (v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) -> case v_B1 of v_B1 { Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2 v_B3 -> v_B3 } }}}
New description:
Hi! I've just encountered a very bizarre error.
=== General description ===
The code:
{{{
class LayersFoldableBuilder__ t (layers :: [Type]) m where
buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
instance Monad m => LayersFoldableBuilder__ t '[] m where
buildLayersFold__ = \_ a -> a
{-# INLINE buildLayersFold__ #-}
instance ( MonadIO m
, Storable.Storable (Layer.Cons l ())
, Layer.StorableLayer l m
, LayerFoldableBuilder__ (EnabledLayer t l) t m l
, LayersFoldableBuilder__ t ls m )
=> LayersFoldableBuilder__ t (l ': ls) m where
buildLayersFold__ = \ptr mr -> do
let fs = buildLayersFold__ @t @ls ptr'
ptr' = Ptr.plusPtr ptr $ Layer.byteSize @l
layerBuild__ @(EnabledLayer t l) @t @m @l ptr $! fs mr
{-# INLINE buildLayersFold__ #-}
}}}
This is a typeclass `LayersFoldableBuilder__` and ALL of its instances.
Please note, that every instance has a `Monad m` or `MonadIO m`
constraint. The program which uses this code heavily runs in 40ms. If we
only add constraint `Monad m =>` to the class definition:
{{{
class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where
buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
}}}
The program runs in 3.5s , which is almost 100 times slower.
Unfortunatelly I do not have minimal example, but it is reproducible. It
is a part of the Luna Language codebase: https://github.com/luna/luna-
core/blob/60bf6130691c23e52b97b067b52becb8fdb0c72e/core/src/Data/Graph/Traversal/Scoped.hs#L102
it was introduced in the commit 60bf6130691c23e52b97b067b52becb8fdb0c72e
on branch static-layers . However, building this is simple: stack bench
luna-core . After invoking it we see the described results.
=== Why its important and what should we do to fix it ===
1. I am writing this because I care about Haskell community. I want GHC
and Haskell to be widely used. Right now, the only thing I hear from all
companies around our company is that impredicative performance, even when
following rules "how to write efficient code" is the biggest pain people
have. Haskell is gathering attention - pure functional programming,
immutability etc - are great. But it will not become a popular choice
unless we care about predictive performance.
2. Such performance changes are unacceptable when thinking about Haskell
and GHC as production ready systems. We need a clear way how to write high
performance Haskell without the need to benchmark every part of our
programs even when refactoring things. GHC has enough information to
discover that we want high performance here and there (even by looking at
INLINE pragmas) and should warn us about lack of optimization. We should
also have a way to force GHC to apply optimizations in particular places -
for example by explicit marking code to be always specialized during
compilation, so GHC would never fall back to dict-passing in such places.
Such possibility would solve MANY related problems and user fears.
3. The point 2 also applies to strictness. In my opinion, having more
clear strictness resolution rules / tools is important. Right now the only
way to know if strictness analysis did a good job and we are not
constantly boxing / unboxing things is to read core, which is tedious and
99% of Haskell users do not even know how to do it (We've got 10 really,
really good Haskellers here, 2 of them are capable of reading core, but
not very fluently). I would love to chat more about these topics, because
they are crucial for growing Haskell community and making Haskell more
popular choice, which is waht we want, right? We don't want Haskell to be
just a research project with "all its users being its authors" at the same
time, am I
=== What happens in core ===
I inspected core and have found that indeed, after adding the constraint,
GHC does not apply all optimizations to the definitions. To be honest, I
completely don't understand it, because the code uses everywhere explicit
`INLINE` pragma to be sure everything is optimized away in the compilation
stage:
{{{
--------------------------------------------------------------------------------
SLOW, with Monad m =>
--------------------------------------------------------------------------------
-- RHS size: {terms: 5, types: 12, coercions: 4, joins: 0/0}
buildLayersFold__ [InlPrag=INLINE]
:: forall t (layers :: [*]) (m :: * -> *).
LayersFoldableBuilder__ t layers m =>
SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
[GblId[ClassOp],
Arity=1,
Caf=NoCafRefs,
Str=,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=False,boring_ok=True)
Tmpl= \ (@ t_ao0O)
(@ (layers_ao0P :: [*]))
(@ (m_ao0Q :: * -> *))
(v_B1 [Occ=Once]
:: LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q)
->
v_B1
`cast`
(Data.Graph.Traversal.Scoped.N:LayersFoldableBuilder__[0]
,
RULES: Built in rule for
Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__: "Class op
$p1LayersFoldableBuilder__"]
Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__
= \ (@ t_ao0P)
(@ (layers_ao0Q :: [*]))
(@ (m_ao0R :: * -> *))
(v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) ->
case v_B1 of v_B1
{ Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2
v_B3 ->
v_B2
}
-- RHS size: {terms: 8, types: 25, coercions: 0, joins: 0/0}
buildLayersFold__
:: forall t (layers :: [*]) (m :: * -> *).
LayersFoldableBuilder__ t layers m =>
SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
[GblId[ClassOp],
Arity=1,
Caf=NoCafRefs,
Str=,
RULES: Built in rule for buildLayersFold__: "Class op buildLayersFold__"]
buildLayersFold__
= \ (@ t_ao0P)
(@ (layers_ao0Q :: [*]))
(@ (m_ao0R :: * -> *))
(v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) ->
case v_B1 of v_B1
{ Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2
v_B3 ->
v_B3
}
}}}
--
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:1
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I agree that predictable performance is super-important. So we give a high priority to unexpected performance lossage. I'd love to know what is going on in your application, but I wonder if you or someone else can help characterise what is going on, and perhaps distil into a smaller test case? What I can say is that the difference between your two classes is the difference between {{{ newtype LayersFoldableBuilder__ t (layers :: [Type]) m = MkD (SomePtr -> m (Fold.Result t) -> m (Fold.Result t)) }}} and {{{ data LayersFoldableBuilder__ t (layers :: [Type]) m = MkD (Monad m) (SomePtr -> m (Fold.Result t) -> m (Fold.Result t)) }}} That is, without the superclass a `LayersFoldableBuilder__` dictionary is represented just by the `buildLayersFold__` function itself; when you add the superclass, it turns into a heap-allocated pair of that function and a `Monad m` dictionary. This might change perf slightly, but a factor of 100 is ridiculous. I have literally no idea why that is happening. The change is so egregious that profiling ought to lead you right to it. (Or use `-ticky` which is faster and less invasive.) You are using 8.4, right? Does it happen with 8.2? (Or is that hard to find out?) Also does it also happen if, instead of adding `Monad m =>` you add a dummy method to the class, like this {{{ class LayersFoldableBuilder__ t (layers :: [Type]) m where buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t) dummy :: t -> t }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: (none) => bgamari Comment: I will try to do a bit of characterisation of this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by thomie): * failure: Compile-time performance bug => Runtime performance bug -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.6.1 => 8.8.1 Comment: This is still waiting on a suitable block of my time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * owner: bgamari => osa1 Comment: I'll look into this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I managed to reproduce this. The original instructions no longer work (the git repo disappeared, the branch doesn't exist in the new repo etc.) so here is what I did to reproduce: - Clone https://github.com/luna/luna.git - Run benchmark: `stack bench luna-core` - Apply this patch: {{{ diff --git a/core/src/Data/Graph/Fold/Layer.hs b/core/src/Data/Graph/Fold/Layer.hs index 28d6c6cd..45150ff1 100644 --- a/core/src/Data/Graph/Fold/Layer.hs +++ b/core/src/Data/Graph/Fold/Layer.hs @@ -141,7 +141,7 @@ instance Monad m => Fold.Builder (Scoped s) m (SmallVectorA t alloc n a) -- === FoldableLayers === -- -class LayersFoldableBuilder__ t (layers :: [Type]) m where +class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t) instance Monad m => LayersFoldableBuilder__ t '[] m where diff --git a/core/src/Data/Graph/Fold/LayerMap.hs b/core/src/Data/Graph/Fold/LayerMap.hs index 4b12bbf6..e5e54e45 100644 --- a/core/src/Data/Graph/Fold/LayerMap.hs +++ b/core/src/Data/Graph/Fold/LayerMap.hs @@ -117,7 +117,7 @@ instance Monad m => Fold.Builder (Scoped s) m (SmallVectorA t alloc n a) -- === FoldableLayers === -- -class LayersFoldableBuilder__ t (layers :: [Type]) m where +class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t) instance Monad m => LayersFoldableBuilder__ t '[] m where diff --git a/core/src/Data/Graph/Fold/Scoped.hs b/core/src/Data/Graph/Fold/Scoped.hs index 2fade0f3..2e6b51df 100644 --- a/core/src/Data/Graph/Fold/Scoped.hs +++ b/core/src/Data/Graph/Fold/Scoped.hs @@ -131,7 +131,7 @@ instance Monad m => Fold.Builder (Scoped t) m (SmallVectorA s alloc n a) -- === FoldableLayers === -- -class LayersFoldableBuilder__ t (layers :: [Type]) m where +class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t) instance Monad m => LayersFoldableBuilder__ t '[] m where diff --git a/core/src/Data/Graph/Fold/ScopedMap.hs b/core/src/Data/Graph/Fold/ScopedMap.hs index 217c55a6..4a3d34c8 100644 --- a/core/src/Data/Graph/Fold/ScopedMap.hs +++ b/core/src/Data/Graph/Fold/ScopedMap.hs @@ -129,7 +129,7 @@ instance Monad m => Fold.Builder (ScopedMap s) m (SmallVectorA t alloc n a) -- === FoldableLayers === -- -class LayersFoldableBuilder__ t (layers :: [Type]) m where +class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t) instance Monad m => LayersFoldableBuilder__ t '[] m where }}} - Run benchmarks again Most of the benchmarks are not effected, but there are three benchmarks which are effected quite significantly by this change: Before the patch: {{{ benchmarking ir/discovery/generic/10e6 time 61.47 ms (61.14 ms .. 61.80 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 61.02 ms (60.79 ms .. 61.20 ms) std dev 367.5 μs (224.7 μs .. 582.3 μs) benchmarking ir/discovery/partitions single var/10e6 time 93.94 ms (93.22 ms .. 94.75 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 93.57 ms (92.95 ms .. 93.94 ms) std dev 746.7 μs (377.0 μs .. 1.245 ms) benchmarking ir/discovery/partitions unify/10e6 time 518.7 ms (508.2 ms .. 523.9 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 515.6 ms (512.3 ms .. 516.9 ms) std dev 2.350 ms (717.7 μs .. 3.196 ms) variance introduced by outliers: 19% (moderately inflated) }}} After the patch: {{{ benchmarking ir/discovery/generic/10e6 time 1.309 s (1.283 s .. 1.326 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.320 s (1.312 s .. 1.334 s) std dev 13.24 ms (767.0 μs .. 16.27 ms) variance introduced by outliers: 19% (moderately inflated) benchmarking ir/discovery/partitions single var/10e6 time 1.355 s (1.351 s .. 1.359 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.357 s (1.356 s .. 1.359 s) std dev 1.415 ms (1.209 ms .. 1.452 ms) variance introduced by outliers: 19% (moderately inflated) benchmarking ir/discovery/partitions unify/10e6 time 5.459 s (5.438 s .. 5.501 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 5.444 s (5.435 s .. 5.452 s) std dev 11.24 ms (7.336 ms .. 13.71 ms) variance introduced by outliers: 19% (moderately inflated) }}} Summary: - ir/discovery/generic/10e6: 21x increase - ir/discovery/partitions single var/10e6: 14x increase - ir/discovery/partitions unify/10e6: 10x increase No ideas why yet. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): So, I've built two versions of the package with `-dump-simpl -dump-to-file -dsuppress-uniques` (which took about an hour to build twice on my i7-8700K desktop). Unfortunately there are 71 modules in this package and none of the files are identical between two versions (there are at least very minor identifier changes) so this is still not easy to debug, but while checking some random files I found some relevant changes between two versions: This expression in the original version: {{{ $dLayersFoldableBuilder__ `cast` (Data.Graph.Fold.Layer.N:LayersFoldableBuilder__[0] <t>_N <layers>_N <m>_N :: (LayersFoldableBuilder__ t layers m :: Constraint) ~R# (SomePtr -> m (Fold.Result t) -> m (Fold.Result t) :: *)) }}} becomes this after `Monad =>`: {{{ buildLayersFold__ @ t @ layers @ m $dLayersFoldableBuilder__ }}} If my understanding is correct, in the original version the typeclass dictionary is represented as its method, but with `Monad m =>` it's not as it not has one more field for the `Monad m` dictionary. This means one layer of indirection in the `Monad m =>` version. There are lots of changes similar to this. Not sure if this by itself explains 20x increase in runtime though. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I'd use `-ticky` which should rapidly identify the culprit. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I be able to get more useful RTS stats and smaller Core I made a smaller reproducer: {{{#!haskell {-# LANGUAGE TypeApplications #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE BangPatterns #-} {-# LANGUAGE ConstraintKinds #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE Strict #-} module Main where import Prologue import qualified Control.Monad.Exception as Exception import qualified Data.Graph.Data.Graph.Class as Graph import qualified Data.Graph.Fold.Partition as Partition import qualified Luna.IR as IR import qualified Luna.Pass as Pass import qualified Luna.Pass.Scheduler as Scheduler import Luna.Pass (Pass) import Luna.Pass.Basic (Compilation) type OnDemandPass stage pass m = ( MonadIO m , Typeable pass , Pass.Compile stage pass m , Exception.MonadException Scheduler.Error m ) runPass :: forall stage pass m . OnDemandPass stage pass m => Pass stage pass () -> m () runPass !pass = Scheduler.evalT $ do Scheduler.registerPassFromFunction__ pass Scheduler.runPassSameThreadByType @pass {-# INLINE runPass #-} runPass' :: Pass Compilation Pass.BasicPass () -> IO () runPass' p = Graph.encodeAndEval @Compilation (runPass p) {-# INLINE runPass' #-} partitionsUnify :: Int -> IO () partitionsUnify i = runPass' $ do !a <- IR.var "a" !b <- IR.var "b" !u <- IR.unify a b let go !0 = let !o = pure () in o go !j = do !_ <- Partition.partition u go $! j - 1 go i main :: IO () main = partitionsUnify (10^6) }}} Put this in core/test/Main.hs and add this to luna-core.cabal: {{{ executable bench-test main-is: Main.hs hs-source-dirs: test/ build-depends: ansi-terminal -any, base -any, containers -any, convert -any, deepseq -any, ghc -any, layered-state -any, luna-autovector -any, luna-core -any, luna-cpp-containers -any, luna-data-storable -any, luna-data-typemap -any, luna-exception -any, luna-foreign-utils -any, luna-generic-traversable -any, luna-generic-traversable2 -any, luna-memory-manager -any, luna-memory-pool -any, luna-tuple-utils -any, mtl -any, primitive -any, prologue -any, structs -any, unboxed-ref >=0.4.0.0, vector -any ghc-options: -O2 -ticky -rtsopts -Wall }}} Results: (with and without `Monad =>`) {{{ ============= With Monad => ======================================================= luna git:(master) $ time (cabal-run bench-test +RTS -s) 27,544,258,632 bytes allocated in the heap 19,561,928 bytes copied during GC 205,496 bytes maximum residency (2 sample(s)) 33,152 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 26366 colls, 0 par 0.270s 0.268s 0.0000s 0.0008s Gen 1 2 colls, 0 par 0.002s 0.002s 0.0008s 0.0011s INIT time 0.000s ( 0.000s elapsed) MUT time 13.449s ( 13.487s elapsed) GC time 0.272s ( 0.269s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 13.721s ( 13.757s elapsed) %GC time 2.0% (2.0% elapsed) Alloc rate 2,048,118,786 bytes per MUT second Productivity 98.0% of total user, 98.0% of total elapsed ( cabal-run bench-test +RTS -s; ) 13,72s user 0,04s system 99% cpu 13,761 total ============= Original ============================================================ luna git:(master) $ time (cabal-run bench-test +RTS -s) 3,952,215,688 bytes allocated in the heap 2,071,824 bytes copied during GC 200,320 bytes maximum residency (2 sample(s)) 33,152 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 3790 colls, 0 par 0.042s 0.043s 0.0000s 0.0008s Gen 1 2 colls, 0 par 0.001s 0.002s 0.0009s 0.0010s INIT time 0.000s ( 0.000s elapsed) MUT time 1.595s ( 1.605s elapsed) GC time 0.043s ( 0.044s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 1.638s ( 1.650s elapsed) %GC time 2.6% (2.7% elapsed) Alloc rate 2,478,513,730 bytes per MUT second Productivity 97.4% of total user, 97.3% of total elapsed ( cabal-run bench-test +RTS -s; ) 1,64s user 0,01s system 99% cpu 1,654 total }}} I'll now try with `-ticky`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * Attachment "original.ticky" added. Ticky output of program in comment:10 without Monad m -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * Attachment "patch.ticky" added. Ticky output of program in comment:10 with Monad m -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * Attachment "bench-test-orig.prof" added. Prof output of program in comment:10 without Monad m -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * Attachment "bench-test-patch.prof" added. Prof output of program in comment:10 with Monad m -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I attached four files for ticky and prof outputs of the program in comment:10 with and wihtout `Monad m =>` patch. I can't make sense of the ticky output -- it's really hard to see what's wrong in a hundred lines long Core function but perhaps someone else can figure it out. One other thing I tried was to test the patch with `-O0`, and the numbers are almost identical: {{{ === ORIGINAL =================================================================== luna git:(master) $ time (cabal-run bench-test +RTS -s) 77,264,754,848 bytes allocated in the heap 114,241,080 bytes copied during GC 240,688 bytes maximum residency (2 sample(s)) 33,152 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 74218 colls, 0 par 0.256s 0.246s 0.0000s 0.0002s Gen 1 2 colls, 0 par 0.001s 0.001s 0.0003s 0.0006s INIT time 0.000s ( 0.000s elapsed) MUT time 25.042s ( 25.168s elapsed) GC time 0.257s ( 0.247s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 25.298s ( 25.415s elapsed) %GC time 1.0% (1.0% elapsed) Alloc rate 3,085,464,250 bytes per MUT second Productivity 99.0% of total user, 99.0% of total elapsed ( cabal-run bench-test +RTS -s; ) 25,30s user 0,12s system 99% cpu 25,423 total === PATCHED =================================================================== luna git:(master) $ time (cabal-run bench-test +RTS -s) 77,200,755,440 bytes allocated in the heap 114,115,976 bytes copied during GC 241,064 bytes maximum residency (2 sample(s)) 33,152 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 74218 colls, 0 par 0.263s 0.254s 0.0000s 0.0002s Gen 1 2 colls, 0 par 0.000s 0.001s 0.0003s 0.0006s INIT time 0.000s ( 0.000s elapsed) MUT time 25.487s ( 25.573s elapsed) GC time 0.263s ( 0.254s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 25.750s ( 25.827s elapsed) %GC time 1.0% (1.0% elapsed) Alloc rate 3,029,012,929 bytes per MUT second Productivity 99.0% of total user, 99.0% of total elapsed ( cabal-run bench-test +RTS -s; ) 25,75s user 0,08s system 100% cpu 25,831 total }}} So it seems to me that with the different dictionary representation we're losing some optimization opportunities. I guess we could try to enable all optimizations again (with -O2) and selectively disable single optimization passes to see which one makes these two versions more similar. That may give an idea about which optimization is not applicable with the `Monad m =>` patch. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): When generating .ticky files I do this {{{ ghc -O -ticky -dverbose-core2core -ddump-stg Foo.hs >& Foo.stg ./Foo +RTS -rFoo.ticky }}} The ticky files are utterly useless without the accompanying `-dverbose- core2core -ddump-stg` output, produced in the very same run of GHC. Would you like to do that and upload the results? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I already have Core outputs, they're 140+ files in total (70+ files for each version) and generating them takes about 30 mins for one version (the program is huge). I'll start generating verbose and STG files now but it'll take a few hours probably (just -ddump-simpl takes 30 minutes). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): No no, don't do that!! Here are the big hitters in the `.ticky` files {{{ ==== Without Monad m (original.ticky) ========= 15000000 984000000 0 2 SS vx{v s21YX} (main:Main) (fun) in r1YhB 9000000 520000000 0 4 pSSS $w$dLayersFoldableBuilder__{v r1YhB} (main:Main) (fun) 24000000 320000000 0 3 SpM main:Main.$w$sgo1{v r19} (fun) 16000000 240000000 0 3 SpM main:Main.$w$sgo3{v r1a} (fun) 4000000 232000000 0 2 SS sat_s2229{v} (main:Main) (fun) in r1YhB 3000000 144000000 0 2 SS sat_s21Zw{v} (main:Main) (fun) in s21YX 9000000 72000000 0 46 ++++++++++++++++++++ luna- core-0.0.6-inplace:Luna.IR.Term.$fStorable1UniTerm10{v rihy9} (fun) 1000000 48000000 0 2 SS lvl1355{v r1Yhx} (main:Main) (fun) 2000000 32000000 0 2 SI lvl1206{v r1Yf3} (main:Main) (fun) 2000000 32000000 0 1 M luna- core-0.0.6-inplace:Luna.IR.Term.$WUniTermVar{v rihwq} (fun) 1000000 24000000 0 2 SI lvl1176{v r1Yez} (main:Main) (fun) 1000000 24000000 0 1 M luna- core-0.0.6-inplace:Luna.IR.Term.$WUniTermUnify{v rihwr} (fun) =========== With Monad m (patched.ticky) ============== 60000000 2496000000 0 7 +++++M. luna- core-0.0.6-inplace:Data.Graph.Fold.Deep.$fLayerBuilderDeepmType_$clayerBuild{v rGoW} (fun) 36000000 1584000000 0 3 >S. $s$fFunctorStateT_$cfmap{v rmuaf} (luna- core-0.0.6-inplace:OCI.Pass.Definition.Class) (fun) 24000000 1152000000 0 1 S sat_s2hor{v} (main:Main) (fun) in r2gUO 20000000 960000000 0 1 S sat_s2hqt{v} (main:Main) (fun) in r2gVh 6000000 480000000 0 3 pSS $wlvl{v r2gUO} (main:Main) (fun) 5000000 400000000 0 3 pSS $wlvl3{v r2gVh} (main:Main) (fun) 8000000 384000000 0 1 S sat_s2hoV{v} (main:Main) (fun) in r2gUQ 6000000 384000000 0 3 SSS $s$fReadertlayerm_$cread__3{v r2hmG} (main:Main) (fun) 6000000 384000000 0 3 SSS $s$fReadertlayerm_$cread__2{v r2hmw} (main:Main) (fun) 24000000 320000000 0 3 SpM main:Main.$w$sgo3{v r8e} (fun) 6000000 288000000 0 3 .SS luna- core-0.0.6-inplace:OCI.Pass.Definition.Class.$fApplicativePass5{v rmu8N} (fun) 12000000 288000000 0 2 .S lvl1{v rmual} (luna-core-0.0.6-inplace:OCI.Pass.Definition.Class) (fun) 16000000 240000000 0 3 SpM main:Main.$w$sgo1{v r8b} (fun) 2000000 160000000 0 3 pSS $wlvl1{v r2gUQ} (main:Main) (fun) 8000000 120000000 0 7 ++++++M $w$cbuild1{v riLVV} (luna-core-0.0.6-inplace:Luna.IR.Term) (fun) 9000000 96000000 0 3 SSM sat_sHlw{v} (luna-core-0.0.6-inplace:Data.Graph.Fold.Deep) (fun) in rGoW 1000000 48000000 0 2 SS lvl1702{v r2hn9} (main:Main) (fun) 2000000 32000000 0 2 SI $cpeekByteOff1{v r3Y9N} (luna-core-0.0.6-inplace:Luna.IR.Term.Core) (fun) 2000000 32000000 0 1 M luna- core-0.0.6-inplace:Luna.IR.Term.$WUniTermVar{v rihWA} (fun) 1000000 32000000 0 1 . sat_siM5u{v} (luna-core-0.0.6-inplace:Luna.IR.Term) (fun) in riLVV }}} So we are getting a log of allocation in * `luna- core-0.0.6-inplace:Data.Graph.Fold.Deep.$fLayerBuilderDeepmType_$clayerBuild` * `$s$fFunctorStateT_$cfmap{v rmuaf} (luna- core-0.0.6-inplace:OCI.Pass.Definition.Class)` Also, in the original version, the big allocation is in Main. So I'd generate `Main.stg` and the STG files for `Data.Graph.Fold.Deep, and perhaps OCI.Pass.Definition.Class.` to begin with -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * cc: MikolajKonarski (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by alpmestan): * cc: alpmestan (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: ⊥ Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * milestone: 8.8.1 => ⊥ Comment: Unfortunately we won't be able to get back to this for 8.8.1, and I don't know if this is doable in its current form so I'm removing the milestone. We should focus on finding a smaller reproducer as the current reproducer is taking forever to build and have dozens (if not hundreds) of dependencies (although the change we need to do is not in a dependency, it may still related with code generated for dependencies. In any case less dependencies is always better when debugging). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: ⊥ Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * owner: osa1 => (none) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: ⊥ Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: sgraf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: ⊥ Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by maoe): * cc: maoe (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC