
#15185: Enum instance for IntX / WordX are inefficient -------------------------------------+------------------------------------- Reporter: guibou | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Core | Version: 8.4.2 Libraries | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- The following factorial benchmark show high discrepancy in performance between `Int` and `Int64` (and similarly between `Word` and `Word64`) when using `product [1..n]` to compute factorial: {{{#!haskell {-# LANGUAGE TypeApplications #-} import Criterion.Main import Data.Int import Data.Word import Numeric.Natural fact :: Integral t => t -> t fact n = product [1..n] fact2 :: Integral t => t -> t fact2 n = go 1 n where go acc 1 = acc go acc n = go (acc * n) (n - 1) main :: IO () main = defaultMain [ bgroup "fact 20" [ bench "Int" $ whnf (fact @Int) 20 , bench "Word" $ whnf (fact @Word) 20 , bench "Int64" $ whnf (fact @Int64) 20 , bench "Word64" $ whnf (fact @Word64) 20 , bench "Integer" $ whnf (fact @Integer) 20 , bench "Natural" $ whnf (fact @Natural) 20 ] , bgroup "fact2 20" [ bench "Int" $ whnf (fact @Int) 20 , bench "Word" $ whnf (fact @Word) 20 , bench "Int64" $ whnf (fact @Int64) 20 , bench "Word64" $ whnf (fact @Word64) 20 , bench "Integer" $ whnf (fact @Integer) 20 , bench "Natural" $ whnf (fact @Natural) 20 ] ] }}} {{{ benchmarking fact 20/Int time 17.33 ns (17.22 ns .. 17.41 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 17.21 ns (17.16 ns .. 17.29 ns) std dev 222.4 ps (184.5 ps .. 282.3 ps) variance introduced by outliers: 15% (moderately inflated) benchmarking fact 20/Word time 17.25 ns (17.01 ns .. 17.63 ns) 0.999 R² (0.997 R² .. 1.000 R²) mean 17.19 ns (17.11 ns .. 17.36 ns) std dev 358.6 ps (199.4 ps .. 674.3 ps) variance introduced by outliers: 32% (moderately inflated) benchmarking fact 20/Int64 time 166.7 ns (165.4 ns .. 168.5 ns) 0.999 R² (0.997 R² .. 1.000 R²) mean 166.8 ns (166.0 ns .. 168.5 ns) std dev 4.077 ns (1.854 ns .. 6.592 ns) variance introduced by outliers: 35% (moderately inflated) benchmarking fact 20/Word64 time 560.5 ns (550.6 ns .. 575.0 ns) 0.995 R² (0.989 R² .. 1.000 R²) mean 556.5 ns (549.1 ns .. 575.9 ns) std dev 36.08 ns (9.195 ns .. 60.69 ns) variance introduced by outliers: 78% (severely inflated) benchmarking fact 20/Integer time 421.9 ns (419.8 ns .. 425.3 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 427.7 ns (425.5 ns .. 431.3 ns) std dev 9.238 ns (6.311 ns .. 16.19 ns) variance introduced by outliers: 28% (moderately inflated) benchmarking fact 20/Natural time 534.5 ns (532.6 ns .. 536.3 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 533.7 ns (532.6 ns .. 535.3 ns) std dev 4.535 ns (3.652 ns .. 6.226 ns) benchmarking fact2 20/Int time 17.14 ns (17.07 ns .. 17.23 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 17.15 ns (17.10 ns .. 17.24 ns) std dev 227.1 ps (167.4 ps .. 286.9 ps) variance introduced by outliers: 16% (moderately inflated) benchmarking fact2 20/Word time 16.93 ns (16.86 ns .. 17.02 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 16.90 ns (16.86 ns .. 16.95 ns) std dev 149.8 ps (107.4 ps .. 192.5 ps) benchmarking fact2 20/Int64 time 165.1 ns (164.6 ns .. 165.7 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 164.8 ns (164.5 ns .. 165.1 ns) std dev 1.033 ns (778.6 ps .. 1.363 ns) benchmarking fact2 20/Word64 time 545.3 ns (542.8 ns .. 547.9 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 543.6 ns (542.4 ns .. 545.2 ns) std dev 4.741 ns (3.736 ns .. 5.824 ns) benchmarking fact2 20/Integer time 419.1 ns (417.2 ns .. 421.1 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 418.2 ns (416.8 ns .. 420.2 ns) std dev 5.415 ns (4.094 ns .. 7.147 ns) variance introduced by outliers: 12% (moderately inflated) benchmarking fact2 20/Natural time 533.8 ns (532.2 ns .. 536.1 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 533.2 ns (532.1 ns .. 534.8 ns) std dev 4.428 ns (3.217 ns .. 5.741 ns) }}} In `fact`, `Int` and `Word` are efficient, `Int64` and `Word64` are respectively 10x and 20x slowers, when they should be identical on 64 bit platform. `Word64` is even slower than `Integer`! Replacing `fact` by a handwritten recursion, in `fact2` gives something more coherent where all the 64 bits types have roughly the same efficiency. I observed in the source code that `Int` and `Int64` are not implemented as type synonym. Using `-ddump-rule-firings`, I observed that a rule `fold/build` fired and removed the `fold` for the `Int` case, and not for the `Int64`, I don't understand much ;( This is not dramatic because most people will use `Int` instead of `Int64`, but it is a surprising behavior. I also observed that smaller integral (such as `Int8`, `Int16`, `Int32`) behaves as badly as `Int64`) and this is much more important because there is no other default and more efficient type synonym. Note that I'll be interested to provide a patch myself if mentored a bit. Especially, I'm thinking about merging the `Int` and `Int64` implementation (with respect to the architecture default word size). Tested on GHC 8.4.2 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15185 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler