Re: [GHC] #13536: Program which terminates instantly in GHC 8.0.2 runs for minutes with 8.2.1

6 Apr 2017

      #13536: Program which terminates instantly in GHC 8.0.2 runs for minutes with 8.2.1
-------------------------------------+-------------------------------------
        Reporter:  RyanGlScott       |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.2.1
       Component:  Compiler          |              Version:  8.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 OK, I've had a bit of a look.  First, if we'd done this:
 {{{
     case eta_s5Di of wild1 {
       (,) a b -> (,) a b
 ====>
     case eta_s5Di of wild1 {
       (,) a b -> wild1
 }}}
 we'd have been fine.  Because `eta_s5Di` points to a single-entry thunk
 (as comment:9 so accurately points out) the thunk won't be updated.  But
 `wild1` will be bound to the heap-allocated pair returned from evaluating
 `eta_s5Di`, not to the `eta_s5Di` thunk, so all would be well.  In fact
 it's ''better'' even if `eta_s5Di` is updated, because if we use
 `eta_s5Di` in the case alternative we have to save it across the eval,
 whereas if we use `wild1` we just use the returned pair directly.  Better
 all round.

 So why are we using `eta_s5Di`?  Because of this code:
 {{{
     cse_bndr | StgApp trivial_scrut [] <- scrut' = trivial_scrut
                  -- See Note [Trivial case scrutinee]
              | otherwise                         = bndr'
 }}}
 The reason for this is explained in the Note, but means that we use
 `eta_s5Di` instead of `wild1`, with exponentially worse cost!  This is
 very bad.

 '''Short term fix''' (Reid): just say
 {{{
     cse_bndr = bndr'
 }}}
 and it'll all work fine.

 ------------------
 '''One side point'''.  Binders in STG have occurrence info attached, and
 `wild1` is marked as dead. If we use it, it'll suddenly become un-dead;
 it'd make me uneasy to have lying occurrence info.  (Apart from anything
 else, the pretty printer doesn't print a dead binder, which is confusing
 if it is then mentioned.)  Why do we need occurrence info on binders?
 Search for `isDeadBinder` in `codeGen`.  However, I don't think it ever
 matters for ''case binders'', so we could safely drop occurrence info for
 them algoteher.

 ------------------
 Back to the main point. Why do we need that special case in `cse_bndr`?
 Reason: consider
 {{{
 case x of r1
   Just a -> case a of r2
               Just b -> let v = Just b
                         in Just v
 }}}
 We want ultimately to get
 {{{
 case x of r1
   Just a -> case a of r2
               Just b -> r1
 }}}
 What actually happens is this.  Suppose we didn't have the special case,
 and always used `bndr'` (as in "Short term fix" above).  Then

 * In the `Just a ->` alternative, we'd extend `ce_conAppMap` with
 {{{
 ce_conAppMap = Just a :-> r1
 }}}
 * Now in the `Just b ->` alternative, we further extend it thus
 {{{
 ce_conAppMap = Just a :-> r1
                Just b :-> r2
 }}}
 * Now when we see `let v = Just b`, we'll add the substitution `v :-> r2`,
 and drop the let-binding (good).
 * But now when we see the `Just v` we'll substitute to get `Just r2`.  But
 alas!  There is no entry `Just r2 :-> r1` in the `ce_conAppMap`, only
 `Just a :-> r`.  (Of course, `a` and `r2` are synonymous here.)

 So that's the problem that `Note [Trivial case scrutinee]` is supposed to
 fix. With the `cse_bndr` fix, the `ce_conAppMap` looks like
 {{{
 ce_conAppMap = Just a :-> x
                Just b :-> a
 }}}
 And now we'll end up with
 {{{
 case x of r1
   Just a -> case a of r2
               Just b -> x
 }}}
 which does collapse the nested allocation, but at the expense of
 introducing the exponential performance bug.

 But it's so unnecesary!  All we need do is to use `r1` instad of `x` in
 the final result and all will be well.  The crucial point is this '''we
 must only add extra references to variables (like `r1` and `r2`) bound to
 data constructors, not to variables (like `x`, `a`, and `b`) bound to
 thunks'''.

 ----------------------
 How can we get the best of both worlds?  Here's my idea

 * '''Ensure that the range of `ce_conAppMap` mentions only variables bound
 to constructors'''; so do NOT do the `cse_bndr` fix above.

 * Instead, add a `ce_bndrMap` that maps a case-binder to the scrutinee.
 Thus, in our example
 {{{
 ce_bndrMap = r1 :-> x
              r2 :-> a
 }}}

 * Now, just before looking up in the `ce_conAppMap`, apply the
 `ce_bndrMap` to the thing you are looking up.  So just before looking up
 `Just r2`, apply the `ce_bndrMap` to get `Just a` and look ''that'' up.
 Do not do anything else with the result of applying the `ce_bndrMap`...
 it's just used to transform a key before looking it up in `ce_conAppMap`.

 Bingo.

 Now, do we really need THREE maps in `CseEnv`?   No: it is easy to combine
 `ce_renaming` and `ce_subst`, which is what we do in `CSE.hs`.

 Finally, a bug in the comments.  Here:
 {{{
     , ce_subst     :: IdEnv OutId
         -- ^ This substitution contains CSE-specific entries. The domain
 are
         --   OutIds, so ce_renaming has to be applied first.
         --   It has an entry x ↦ y when a let-binding `let x = Con y` is
         --   removed because `let y = Con z` is in scope.
 }}}
 In the second-last line, that `Con y` should be `Con z`.

 '''Joachim''': would you like to work on this?

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13536#comment:12
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler