[GHC] #15185: Enum instance for IntX / WordX are inefficient

25 May 2018

      #15185: Enum instance for IntX / WordX are inefficient
-------------------------------------+-------------------------------------
           Reporter:  guibou         |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:  8.6.1
          Component:  Core           |           Version:  8.4.2
  Libraries                          |
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 The following factorial benchmark show high discrepancy in performance
 between `Int` and `Int64` (and similarly between `Word` and `Word64`) when
 using `product [1..n]` to compute factorial:

 {{{#!haskell
 {-# LANGUAGE TypeApplications #-}
 import Criterion.Main

 import Data.Int
 import Data.Word
 import Numeric.Natural

 fact :: Integral t => t -> t
 fact n = product [1..n]

 fact2 :: Integral t => t -> t
 fact2 n = go 1 n
   where go acc 1 = acc
         go acc n = go (acc * n) (n - 1)

 main :: IO ()
 main = defaultMain [
   bgroup "fact 20" [
      bench "Int"  $ whnf (fact @Int) 20
      , bench "Word"  $ whnf (fact @Word) 20
      , bench "Int64"  $ whnf (fact @Int64) 20
      , bench "Word64"  $ whnf (fact @Word64) 20
      , bench "Integer"  $ whnf (fact @Integer) 20
      , bench "Natural"  $ whnf (fact @Natural) 20
                ]
   ,
   bgroup "fact2 20" [
      bench "Int"  $ whnf (fact @Int) 20
      , bench "Word"  $ whnf (fact @Word) 20
      , bench "Int64"  $ whnf (fact @Int64) 20
      , bench "Word64"  $ whnf (fact @Word64) 20
      , bench "Integer"  $ whnf (fact @Integer) 20
      , bench "Natural"  $ whnf (fact @Natural) 20
                ]
   ]
 }}}

 {{{
 benchmarking fact 20/Int
 time                 17.33 ns   (17.22 ns .. 17.41 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 17.21 ns   (17.16 ns .. 17.29 ns)
 std dev              222.4 ps   (184.5 ps .. 282.3 ps)
 variance introduced by outliers: 15% (moderately inflated)

 benchmarking fact 20/Word
 time                 17.25 ns   (17.01 ns .. 17.63 ns)
                      0.999 R²   (0.997 R² .. 1.000 R²)
 mean                 17.19 ns   (17.11 ns .. 17.36 ns)
 std dev              358.6 ps   (199.4 ps .. 674.3 ps)
 variance introduced by outliers: 32% (moderately inflated)

 benchmarking fact 20/Int64
 time                 166.7 ns   (165.4 ns .. 168.5 ns)
                      0.999 R²   (0.997 R² .. 1.000 R²)
 mean                 166.8 ns   (166.0 ns .. 168.5 ns)
 std dev              4.077 ns   (1.854 ns .. 6.592 ns)
 variance introduced by outliers: 35% (moderately inflated)

 benchmarking fact 20/Word64
 time                 560.5 ns   (550.6 ns .. 575.0 ns)
                      0.995 R²   (0.989 R² .. 1.000 R²)
 mean                 556.5 ns   (549.1 ns .. 575.9 ns)
 std dev              36.08 ns   (9.195 ns .. 60.69 ns)
 variance introduced by outliers: 78% (severely inflated)

 benchmarking fact 20/Integer
 time                 421.9 ns   (419.8 ns .. 425.3 ns)
                      1.000 R²   (0.999 R² .. 1.000 R²)
 mean                 427.7 ns   (425.5 ns .. 431.3 ns)
 std dev              9.238 ns   (6.311 ns .. 16.19 ns)
 variance introduced by outliers: 28% (moderately inflated)

 benchmarking fact 20/Natural
 time                 534.5 ns   (532.6 ns .. 536.3 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 533.7 ns   (532.6 ns .. 535.3 ns)
 std dev              4.535 ns   (3.652 ns .. 6.226 ns)

 benchmarking fact2 20/Int
 time                 17.14 ns   (17.07 ns .. 17.23 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 17.15 ns   (17.10 ns .. 17.24 ns)
 std dev              227.1 ps   (167.4 ps .. 286.9 ps)
 variance introduced by outliers: 16% (moderately inflated)

 benchmarking fact2 20/Word
 time                 16.93 ns   (16.86 ns .. 17.02 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 16.90 ns   (16.86 ns .. 16.95 ns)
 std dev              149.8 ps   (107.4 ps .. 192.5 ps)

 benchmarking fact2 20/Int64
 time                 165.1 ns   (164.6 ns .. 165.7 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 164.8 ns   (164.5 ns .. 165.1 ns)
 std dev              1.033 ns   (778.6 ps .. 1.363 ns)

 benchmarking fact2 20/Word64
 time                 545.3 ns   (542.8 ns .. 547.9 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 543.6 ns   (542.4 ns .. 545.2 ns)
 std dev              4.741 ns   (3.736 ns .. 5.824 ns)

 benchmarking fact2 20/Integer
 time                 419.1 ns   (417.2 ns .. 421.1 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 418.2 ns   (416.8 ns .. 420.2 ns)
 std dev              5.415 ns   (4.094 ns .. 7.147 ns)
 variance introduced by outliers: 12% (moderately inflated)

 benchmarking fact2 20/Natural
 time                 533.8 ns   (532.2 ns .. 536.1 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 533.2 ns   (532.1 ns .. 534.8 ns)
 std dev              4.428 ns   (3.217 ns .. 5.741 ns)
 }}}

 In `fact`, `Int` and `Word` are efficient, `Int64` and `Word64` are
 respectively 10x and 20x slowers, when they should be identical on 64 bit
 platform. `Word64` is even slower than `Integer`!

 Replacing `fact` by a handwritten recursion, in `fact2` gives something
 more coherent where all the 64 bits types have roughly the same
 efficiency.

 I observed in the source code that `Int` and `Int64` are not implemented
 as type synonym.

 Using `-ddump-rule-firings`, I observed that a rule `fold/build` fired and
 removed the `fold` for the `Int` case, and not for the `Int64`, I don't
 understand much ;(

 This is not dramatic because most people will use `Int` instead of
 `Int64`, but it is a surprising behavior. I also observed that smaller
 integral (such as `Int8`, `Int16`, `Int32`) behaves as badly as `Int64`)
 and this is much more important because there is no other default and more
 efficient type synonym.

 Note that I'll be interested to provide a patch myself if mentored a bit.
 Especially, I'm thinking about merging the `Int` and `Int64`
 implementation (with respect to the architecture default word size).

 Tested on GHC 8.4.2

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15185
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler