Re: [GHC] #7258: Compiling DynFlags is jolly slow

8 Nov 2017

      #7258: Compiling DynFlags is jolly slow
-------------------------------------+-------------------------------------
        Reporter:  simonpj           |                Owner:  simonpj
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.6.1
      Resolution:                    |             Keywords:  deriving-perf
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tdammers):

 Further evidence, based on the observation that the problem is caused by
 certain kinds of CPS-style code:

 {{{
 module Nested
 where

 -- A data type capturing the relevant structure of the P type from ReadP
 data A a = A (Int -> A a) | N a

 -- This implementation produces deeply nested Core:
 f10 :: A (Int, Int, Int, Int, Int, Int, Int, Int, Int, Int)
 f10 =
   A (\i0 ->
   A (\i1 ->
   A (\i2 ->
   A (\i3 ->
   A (\i4 ->
   A (\i5 ->
   A (\i6 ->
   A (\i7 ->
   A (\i8 ->
   A (\i9 ->
   N (i0, i1, i2, i3, i4, i5, i6, i7, i8, i9)
   ))))))))))

 -- This implementation produces flat Core.
 g10 :: A (Int, Int, Int, Int, Int, Int, Int, Int, Int, Int)
 g10 = a0
   where
     {-#NOINLINE a10#-}
     a10 = \(i0, i1, i2, i3, i4, i5, i6, i7, i8) i9 -> N (i0, i1, i2, i3,
 i4, i5, i6, i7, i8, i9)
     {-#NOINLINE a9#-}
     a9  = \(i0, i1, i2, i3, i4, i5, i6, i7) i8 -> A (a10 (i0, i1, i2, i3,
 i4, i5, i6, i7, i8))
     {-#NOINLINE a8#-}
     a8  = \(i0, i1, i2, i3, i4, i5, i6) i7 -> A (a9 (i0, i1, i2, i3, i4,
 i5, i6, i7))
     {-#NOINLINE a7#-}
     a7  = \(i0, i1, i2, i3, i4, i5) i6 -> A (a8 (i0, i1, i2, i3, i4, i5,
 i6))
     {-#NOINLINE a6#-}
     a6  = \(i0, i1, i2, i3, i4) i5 -> A (a7 (i0, i1, i2, i3, i4, i5))
     {-#NOINLINE a5#-}
     a5  = \(i0, i1, i2, i3) i4 -> A (a6 (i0, i1, i2, i3, i4))
     {-#NOINLINE a4#-}
     a4  = \(i0, i1, i2) i3 -> A (a5 (i0, i1, i2, i3))
     {-#NOINLINE a3#-}
     a3  = \(i0, i1) i2 -> A (a4 (i0, i1, i2))
     {-#NOINLINE a2#-}
     a2  = \i0 i1 -> A (a3 (i0, i1))
     {-#NOINLINE a1#-}
     a1  = \i0 -> A (a2 i0)
     {-#NOINLINE a0#-}
     a0  = A a1
 }}}

 This module tries to provoke "tupling up the free variables" in `g10`, and
 compare it to the naive implementation `f10`. I do not know yet whether
 this would actually produce any performance benefits though.

 The NOINLINE pragmas are needed by the way, otherwise GHC decides to
 inline the `a` functions, and we end up with a mess similar to `f10`.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/7258#comment:87
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler