Re: [GHC] #13001: EnumFromThenTo is is not a good producer

5 Jan 2017

      #13001: EnumFromThenTo is is not a good producer
-------------------------------------+-------------------------------------
        Reporter:  George            |                Owner:
            Type:  bug               |               Status:  new
        Priority:  low               |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  MacOS X           |         Architecture:  x86_64
 Type of failure:  Runtime           |  (amd64)
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 It's true that doing more inlining will generally improve things, but we
 could be a bit more forensic about this.  Just sprinkling more inline
 pragmas, some of which may do no good, is a bit of a blunt instrument.

 I checked the code for `testFromThenTo`, compiled with -O, and got
 {{{
 Foo.$wtestFromThenTo =
   \ (ww_s2vE :: GHC.Prim.Int#) ->
     case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<# ww_s2vE 0#) of {
       False ->
         case ww_s2vE of wild2_a2uk {
           __DEFAULT ->
             case GHC.Real.$wf1 10# wild2_a2uk of ww4_a2uE { __DEFAULT ->
             GHC.Enum.efdtIntUpFB
               @ (Int -> Int)
               (GHC.List.lengthFB @ Int)
               (id @ Int)
               0#
               2#
               ww4_a2uE
               Foo.testFromThenTo2
             };
           0# -> Foo.testFromThenTo1
         };
       True -> GHC.Real.^2
     }
 }}}
 Fusion has happened (there is a use of the `foldr/build` rule), but the
 call to `edftIntUpFB` remains.  That's not necessarily bad.  As you'll
 see, it's not a tiny function:
 {{{
 efdtIntUpFB :: (Int -> r -> r) -> r -> Int# -> Int# -> Int# -> r
 efdtIntUpFB c n x1 x2 y    -- Be careful about overflow!
  | isTrue# (y <# x2) = if isTrue# (y <# x1) then n else I# x1 `c` n
  | otherwise = -- Common case: x1 <= x2 <= y
                let !delta = x2 -# x1 -- >= 0
                    !y' = y -# delta  -- x1 <= y' <= y; hence y' is
 representable

                    -- Invariant: x <= y
                    -- Note that: z <= y' => z + delta won't overflow
                    -- so we are guaranteed not to overflow if/when we
 recurse
                    go_up x | isTrue# (x ># y') = I# x `c` n
                            | otherwise         = I# x `c` go_up (x +#
 delta)
                in I# x1 `c` go_up x2
 }}}
 BUT the bad thing is that it allocated loads of `Int` values!  Why does it
 do that?  Becuase the function passed to it (`c` in the above definition)
 takes an `Int` as its argument.  So if `efdtIntUpFB` isn't inlined, it
 ''must'' allocate an `Int` box for every iteration.  Bad!

 But this is an internal function.  Suppose we gave it this signature:
 {{{
 efdtIntUpFB :: (Int# -> r -> r) -> r -> Int# -> Int# -> Int# -> r
             --  ^^^ NB Int# not Int
 }}}
 Now it won't allocate!  Can we make the rest of the pieces fit together
 now?  Would you like to have a try?

 I also looked at the call to `lengthFB` in the above optimised code.  It's
 defined like this:
 {{{
 lengthFB :: x -> (Int -> Int) -> Int -> Int
 lengthFB _ r = \ !a -> r (a + 1)
 }}}
 Uh-ho!  More `Int` boxes!  So I tried rewriting the `length` moving parts
 (in `GHC.List`) like this:
 {{{
 {-# INLINE xlength #-}
 xlength                  :: [a] -> Int
 xlength xs               = I# (xlenAcc xs 0#)

 xlenAcc          :: [a] -> Int# -> Int#
 xlenAcc []     n = n
 xlenAcc (_:ys) n = xlenAcc ys (n +# 1#)

 {-# RULES
 "xlength" [~1] forall xs . xlenAcc xs = foldr xlengthFB idXlength xs
 "xlengthList" [1] foldr xlengthFB idXlength = xlenAcc
  #-}

 -- The lambda form turns out to be necessary to make this inline
 -- when we need it to and give good performance.
 {-# INLINE [0] xlengthFB #-}
 xlengthFB :: x -> (Int# -> Int#) -> Int# -> Int#
 xlengthFB _ r = \ a -> r (a +# 1#)

 {-# INLINE [0] idXlength #-}
 idXlength :: Int# -> Int#
 idXlength x = x
 }}}
 That compiles fine.  Even if it generates the same code as before, GHC
 will have to do less work to optimise it, so it's a win.

 Would you like to try that change to `length` and see if it is an
 improvement?

 Maybe you can do the same for `efdtIntUpFB`?

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:4
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler