[GHC] #10371: GHC fails to inline and specialize a function

#10371: GHC fails to inline and specialize a function -------------------------------------+------------------------------------- Reporter: | Owner: MikeIzbicki | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Compiler | Operating System: Unknown/Multiple Keywords: | Type of failure: None/Unknown Architecture: | Blocked By: Unknown/Multiple | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- I have an alternative Prelude library called [subhask](https://github.com/mikeizbicki/subhask) that redefines the numeric type class hierarchy. I'm trying to update it to work with GHC 7.10, but there is a major inlining bug that is killing performance. The code below demonstrates the issue. It first defines a distance function over 2 vectors, then measures the performance using criterion. (It requires the subhask to compile.) {{{ {-# LANGUAGE BangPatterns #-} import Control.DeepSeq import Criterion.Main import qualified Data.Vector.Unboxed as VU import qualified Data.Vector.Generic as VG import qualified Prelude import SubHask -- distance_standalone :: VU.Vector Float -> VU.Vector Float -> Float distance_standalone v1 v2 = sqrt $ go 0 0 where go !tot !i = if i>VG.length v1-4 then goEach tot i else go tot' (i+4) where tot' = tot +(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i) *(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i) +(v1 `VG.unsafeIndex` (i+1)-v2 `VG.unsafeIndex` (i+1)) *(v1 `VG.unsafeIndex` (i+1)-v2 `VG.unsafeIndex` (i+1)) +(v1 `VG.unsafeIndex` (i+2)-v2 `VG.unsafeIndex` (i+2)) *(v1 `VG.unsafeIndex` (i+2)-v2 `VG.unsafeIndex` (i+2)) +(v1 `VG.unsafeIndex` (i+3)-v2 `VG.unsafeIndex` (i+3)) *(v1 `VG.unsafeIndex` (i+3)-v2 `VG.unsafeIndex` (i+3)) goEach !tot !i = if i>= VG.length v1 then tot else goEach tot' (i+1) where tot' = tot+(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i) *(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i) main = do let v1 = VU.fromList [1..200] :: VU.Vector Float v2 = VU.fromList [1..200] :: VU.Vector Float deepseq v1 $ deepseq v2 $ return () defaultMain [ bench "distance_standalone" $ nf (distance_standalone v1) v2 ] }}} Here are the results of compiling and running using GHC 7.10 and 7.8: {{{ $ ghc-7.10.1 Main.hs -O2 -fforce-recomp -ddump-to-file -ddump-simpl && ./Main [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking Main ... benchmarking distance_standalone time 8.135 μs (8.121 μs .. 8.154 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 8.188 μs (8.158 μs .. 8.250 μs) std dev 139.3 ns (66.05 ns .. 250.4 ns) variance introduced by outliers: 15% (moderately inflated) }}} {{{ $ ghc-7.8.2 Main.hs -O2 -fforce-recomp -ddump-to-file -ddump-simpl && ./Main [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking Main ... benchmarking distance_standalone time 733.2 ns (732.9 ns .. 733.6 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 734.1 ns (733.7 ns .. 734.5 ns) std dev 1.458 ns (1.262 ns .. 1.754 ns) }}} As you can see, GHC 7.10 is 10x slower. Looking through the core output shows that the cause of this is that GHC 7.8 is properly specializing the code whereas GHC 7.10 is not. If you uncomment the type signature before the `distance_standalone` function then both compilers perform at the faster speed. I believe the cause of this may be related to the complicated class numeric class hierarchy in SubHask. If you comment out the lines: {{{ import qualified Prelude import SubHask }}} then GHC uses the Prelude hierarchy instead of SubHask's hierarchy, and both compilers generate the faster program. There's one last wrinkle. If you define the `distance_standalone` function in a different file. Then in GHC 7.10, the `INLINE` and `INLINABLE` pragmas do absolutely nothing. Not only does the resulting code not get inlined, but if I add the specialization: {{{ {-# SPECIALIZE distance_standalone :: VU.Vector Float -> VU.Vector Float -> Float #-} }}} to the Main file, I get an error message saying something like: {{{ bench/Vector.hs:18:1: Warning: You cannot SPECIALISE ‘distance_standalone{v ru1}’ because its definition has no INLINE/INLINABLE pragma (or its defining module ‘subhask-0.1.0.0@subha_LNZiQvSbo8Z0VdLTwuvkrN:SubHask.Algebra’ was compiled without -O) }}} I get this message despite the fact that the defining module was compiled with `-O2` and the function had an `INLINABLE` pragma. If I add the specialization pragma to the defining module, then I don't get the warning, but the code still doesn't specialize properly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10371 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10371: GHC fails to inline and specialize a function -------------------------------------+------------------------------------- Reporter: MikeIzbicki | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: #8668 | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by MikeIzbicki): * related: => #8668 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10371#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10371: GHC fails to inline and specialize a function -------------------------------------+------------------------------------- Reporter: MikeIzbicki | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: #8668 | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by MikeIzbicki): I figured out a solution that works for me. Inside my class hierarchy there were several definitions that are similar to type synonyms: {{{ class (Rg r, Group r) => Rng r instance (Rg r, Group r) => Rng r }}} instead of: {{{ type Rng r = (Rg r, Group r) }}} Replacing these definitions with actual type synonyms fixed the problem. I was using the original version for two reasons. First, GHC 7.8 lacks ConstraintKinds support in TemplateHaskell. This is obviously not relevant after the upgrade to 7.10. Second, There are places where I would prefer to be able to partially apply the constraint, which you can't do as a type synonym. I can live without partial application, but it would be nice to have back. Unfortunately, these class definitions are not the sole cause of the problem. I tried recreating the relevant subset of my class hierarchy in another file. But I couldn't reproduce the problem using only this subset of code. So these classes must somehow be interacting with another part of the library which (seems to be) unrelated. Tracking down this other cause and giving a nice short test case here seems like more work than I can put into this since I have a workaround. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10371#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10371: GHC fails to inline and specialize a function -------------------------------------+------------------------------------- Reporter: MikeIzbicki | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8668 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by mpickering): * keywords: => Inlining -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10371#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10371: GHC fails to inline and specialize a function -------------------------------------+------------------------------------- Reporter: MikeIzbicki | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8668 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nfrisby): I haven't been able to reproduce this with GHC 7.10.2. I see ~200 ns averages using: * GHC 7.10.2 * subhask-0.1.0.0 or subhask-0.1.0.0 (from Hackage sdist, with relaxed build-depends) * MonadRandom-0.4, QuickCheck-2.8.1, approximate-0.2.2.2, base-4.8.1.0, bloomfilter-2.0.1.0, bytes-0.15.0.1, bytestring-0.10.6.0, cassava-0.4.4.0, containers-0.5.6.2, deepseq-1.4.1.1, erf-2.0.0.0, gamma-0.9.0.2, ghc- prim-0.4.0.0, hmatrix-0.16.1.5, hyperloglog-0.3.4, lens-4.12.3, monad- primitive-0.1, mtl-2.2.1, parallel-3.2.0.6, pipes-4.1.6, primitive-0.6.1.0, semigroups-0.16.2.2, template-haskell-2.10.0.0, vector-0.10.12.3 However, I don't know if either of those are the right `subhask` to use. MikeIzbicki, do you have the source for the `subhask` that you originally observed the regression with? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10371#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC