[GHC] #15176: Superclass `Monad m =>` makes program run 100 times slower

22 May 2018

      #15176: Superclass `Monad m =>` makes program run 100 times slower
-------------------------------------+-------------------------------------
           Reporter:  danilo2        |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  highest        |         Milestone:  8.6.1
          Component:  Compiler       |           Version:  8.4.2
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  Compile-time
  Unknown/Multiple                   |  performance bug
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 Hi! I've just encountered a very bizarre error.

 === General description ===

 The code:

 {{{
 class LayersFoldableBuilder__ t (layers :: [Type]) m where
     buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)

 instance Monad m => LayersFoldableBuilder__ t '[] m where
     buildLayersFold__ = \_ a -> a
     {-# INLINE buildLayersFold__ #-}

 instance ( MonadIO m
          , Storable.Storable (Layer.Cons l ())
          , Layer.StorableLayer l m
          , LayerFoldableBuilder__ (EnabledLayer t l) t m l
          , LayersFoldableBuilder__ t ls m )
      => LayersFoldableBuilder__ t (l ': ls) m where
     buildLayersFold__ = \ptr mr -> do
         let fs   = buildLayersFold__ @t @ls ptr'
             ptr' = Ptr.plusPtr ptr $ Layer.byteSize @l
         layerBuild__ @(EnabledLayer t l) @t @m @l ptr $! fs mr
     {-# INLINE buildLayersFold__ #-}
 }}}

 This is a typeclass `LayersFoldableBuilder__` and ALL of its instances.
 Please note, that every instance has a `Monad m` or `MonadIO m`
 constraint. The program which uses this code heavily runs in 40ms. If we
 only add constraint `Monad m =>` to the class definition:

 {{{
 class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where
     buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
 }}}

 The program runs in 3.5s , which is almost 100 times slower.

 Unfortunatelly I do not have minimal example, but it is reproducible. It
 is a part of the Luna Language codebase: https://github.com/luna/luna-
 core/blob/60bf6130691c23e52b97b067b52becb8fdb0c72e/core/src/Data/Graph/Traversal/Scoped.hs#L102

 it was introduced in the commit 60bf6130691c23e52b97b067b52becb8fdb0c72e
 on branch static-layers . However, building this is simple: stack bench
 luna-core . After invoking it we see the described results.

 === Why its important and what should we do to fix it ===

 1. I am writing this because I care about Haskell community. I want GHC
 and Haskell to be widely used. Right now, the only thing I hear from all
 companies around our company is that impredicative performance, even when
 following rules "how to write efficient code" is the biggest pain people
 have. Haskell is gathering attention - pure functional programming,
 immutability etc - are great. But it will not become a popular choice
 unless we care about predictive performance.

 2. Such performance changes are unacceptable when thinking about Haskell
 and GHC as production ready systems. We need a clear way how to write high
 performance Haskell without the need to benchmark every part of our
 programs even when refactoring things. GHC has enough information to
 discover that we want high performance here and there (even by looking at
 INLINE pragmas) and should warn us about lack of optimization. We should
 also have a way to force GHC to apply optimizations in particular places -
 for example by explicit marking code to be always specialized during
 compilation, so GHC would never fall back to dict-passing in such places.
 Such possibility would solve MANY related problems and user fears.

 3. The point 2 also applies to strictness. In my opinion, having more
 clear strictness resolution rules / tools is important. Right now the only
 way to know if strictness analysis did a good job and we are not
 constantly boxing / unboxing things is to read core, which is tedious and
 99% of Haskell users do not even know how to do it (We've got 10 really,
 really good Haskellers here, 2 of them are capable of reading core, but
 not very fluently). I would love to chat more about these topics, because
 they are crucial for growing Haskell community and making Haskell more
 popular choice, which is waht we want, right? We don't want Haskell to be
 just a research project with "all its users being its authors" at the same
 time, am I

 === What happens in core ===

 I inspected core and have found that indeed, after adding the constraint,
 GHC does not apply all optimizations to the definitions. To be honest, I
 completely don't understand it, because the code uses everywhere explicit
 `INLINE` pragma to be sure everything is optimized away in the compilation
 stage:

 {{{
 --------------------------------------------------------------------------------
 SLOW, without Monad m =>
 --------------------------------------------------------------------------------

 -- RHS size: {terms: 5, types: 12, coercions: 4, joins: 0/0}
 buildLayersFold__ [InlPrag=INLINE]
   :: forall t (layers :: [*]) (m :: * -> *).
      LayersFoldableBuilder__ t layers m =>
      SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
 [GblId[ClassOp],
  Arity=1,
  Caf=NoCafRefs,
  Str=,
  Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
          WorkFree=True, Expandable=True,
          Guidance=ALWAYS_IF(arity=1,unsat_ok=False,boring_ok=True)
          Tmpl= \ (@ t_ao0O)
                  (@ (layers_ao0P :: [*]))
                  (@ (m_ao0Q :: * -> *))
                  (v_B1 [Occ=Once]
                     :: LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q)
 ->
                  v_B1
                  `cast`
 (Data.Graph.Traversal.Scoped.N:LayersFoldableBuilder__[0]
                              _N _N _N
                          :: (LayersFoldableBuilder__
                                t_ao0O layers_ao0P m_ao0Q :: Constraint)
                             ~R# (SomePtr
                                  -> m_ao0Q (Fold.Result t_ao0O)
                                  -> m_ao0Q (Fold.Result t_ao0O) :: *))}]
 buildLayersFold__
   = \ (@ t_ao0O)
       (@ (layers_ao0P :: [*]))
       (@ (m_ao0Q :: * -> *))
       (v_B1 :: LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q) ->
       v_B1
       `cast` (Data.Graph.Traversal.Scoped.N:LayersFoldableBuilder__[0]
                   _N _N _N
               :: (LayersFoldableBuilder__
                     t_ao0O layers_ao0P m_ao0Q :: Constraint)
                  ~R# (SomePtr
                       -> m_ao0Q (Fold.Result t_ao0O)
                       -> m_ao0Q (Fold.Result t_ao0O) :: *))

 --------------------------------------------------------------------------------
 FAST, without Monad m =>
 --------------------------------------------------------------------------------

 -- RHS size: {terms: 8, types: 25, coercions: 0, joins: 0/0}
 Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__
   :: forall t (layers :: [*]) (m :: * -> *).
      LayersFoldableBuilder__ t layers m =>
      Monad m
 [GblId[ClassOp],
  Arity=1,
  Caf=NoCafRefs,
  Str=,
  RULES: Built in rule for
 Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__: "Class op
 $p1LayersFoldableBuilder__"]
 Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__
   = \ (@ t_ao0P)
       (@ (layers_ao0Q :: [*]))
       (@ (m_ao0R :: * -> *))
       (v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) ->
       case v_B1 of v_B1
       { Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2
                                                               v_B3 ->
       v_B2
       }

 -- RHS size: {terms: 8, types: 25, coercions: 0, joins: 0/0}
 buildLayersFold__
   :: forall t (layers :: [*]) (m :: * -> *).
      LayersFoldableBuilder__ t layers m =>
      SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
 [GblId[ClassOp],
  Arity=1,
  Caf=NoCafRefs,
  Str=,
  RULES: Built in rule for buildLayersFold__: "Class op buildLayersFold__"]
 buildLayersFold__
   = \ (@ t_ao0P)
       (@ (layers_ao0Q :: [*]))
       (@ (m_ao0R :: * -> *))
       (v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) ->
       case v_B1 of v_B1
       { Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2
                                                               v_B3 ->
       v_B3
       }
 }}}

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

tags

participants (1)