[GHC] #10800: vector-0.11

#10800: vector-0.11 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: 7.12.1 Component: Compiler | Version: 7.10.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Compile-time Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Revisions: | -------------------------------------+------------------------------------- vector-0.11 takes substantially longer to compile with 7.10 than 7.8 Unfortunately the issue appears to be in generated code. Lacking time right now so I'm just going to drop the IRC log here, Compile log: https://travis-ci.org/haskell/vector/builds/56179052 {{{ * hvr thought some compile-time improvements landed in 7.10.2 <dolio> I don't think it uses sufficiently less memory to prevent it from using all memory on travis and failing. <dolio> Which is why it suddenly went from 8 minutes to 35. <dolio> Although it's also significantly worse even when you have enough memory. <dolio> hvr: 7.8.4 takes 4m40s, 7.10.2 takes 16m40s. <dolio> i7 4770, 32GB RAM. <dolio> vector 0.11 branch. <dolio> It's not as bad in the 0.10 branch. <dolio> It uses like 60% more memory, too. <dolio> Maybe more. <dolio> It's really just one or two files in the test suite that take all the time/memory. <hvr> dolio: does vector do some aggressive INLINEing? <dolio> Well, yes. <hvr> more in 0.11 than in 0.10? <dolio> It does more stuff in 0.11, yes. <bgamari> vector 0.11 has this new chunked streaming abstraction IIRC <hvr> there's a generic one though iirc <dolio> 0.11 introduced the generalized stream fusion, so there's like three different things going on that specialization has to select one of. <dolio> Instead of just one thing like in 0.10. <dolio> Anyhow, the problem file(s) generate a bunch of tests using template Haskell, so they're pretty opaque. <dolio> I'm pretty sure it's the generated code that's the problem, though. <dolio> Not the template haskell that generates it. }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: 7.12.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 7.12.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by simonpj): * priority: normal => highest Comment: Let's be sure to look at this; increasing priority. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by bgamari): The culprit appears to be `tests/Tests/Vector.hs`, which seems to blow up in the simplifier with GHC 7.10.1, = 7.10.1 = {{{ [5 of 6] Compiling Tests.Vector ( tests/Tests/Vector.hs, dist/dist- sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp/Tests/Vector.o ) *** Parser: *** Renamer/typechecker: *** Simplify: *** CorePrep: *** ByteCodeGen: ... *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: tests/Tests/Vector.hs:20:1: Warning: The import of ‘Data.Monoid’ is redundant except perhaps to import instances from ‘Data.Monoid’ To import instances alone, use: import Data.Monoid() tests/Tests/Vector.hs:405:5: Warning: Defined but not used: ‘limitUnfolds’ tests/Tests/Vector.hs:408:5: Warning: Defined but not used: ‘prop_unfoldr’ tests/Tests/Vector.hs:615:1: Warning: Defined but not used: ‘testBoolUnboxedVector’ *** Desugar: Result size of Desugar (after optimization) = {terms: 8,929, types: 29,113, coercions: 12,626} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 7,718, types: 22,842, coercions: 12,683} Result size of Simplifier iteration=2 = {terms: 7,645, types: 22,550, coercions: 12,627} Result size of Simplifier = {terms: 7,645, types: 22,550, coercions: 12,627} *** Specialise: Result size of Specialise = {terms: 158,601, types: 473,559, coercions: 266,978} *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}): Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 191,838, types: 610,090, coercions: 266,978} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 341,716, types: 752,822, coercions: 950,308} Result size of Simplifier iteration=2 = {terms: 341,082, types: 678,899, coercions: 700,199} Result size of Simplifier iteration=3 = {terms: 324,911, types: 649,679, coercions: 477,703} Result size of Simplifier iteration=4 = {terms: 323,711, types: 646,845, coercions: 470,304} Result size of Simplifier = {terms: 323,711, types: 646,845, coercions: 470,304} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 2,421,941, types: 4,434,113, coercions: 2,600,098} Result size of Simplifier iteration=2 = {terms: 1,532,846, types: 2,868,287, coercions: 1,167,104} Result size of Simplifier iteration=3 = {terms: 1,333,796, types: 2,347,850, coercions: 910,168} Result size of Simplifier iteration=4 = {terms: 1,326,682, types: 2,333,558, coercions: 893,473} Result size of Simplifier = {terms: 1,326,682, types: 2,333,558, coercions: 893,473} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 2,517,661, types: 3,700,180, coercions: 1,244,341} Result size of Simplifier iteration=2 = {terms: 1,648,651, types: 2,362,735, coercions: 1,118,836} Result size of Simplifier iteration=3 = {terms: 1,533,366, types: 2,160,536, coercions: 1,018,335} Result size of Simplifier iteration=4 = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409} Result size of Simplifier = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409} *** Float inwards: Result size of Float inwards = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409} *** Called arity analysis: Result size of Called arity analysis = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 1,525,383, types: 2,148,094, coercions: 1,012,837} Result size of Simplifier iteration=2 = {terms: 1,519,162, types: 2,137,087, coercions: 1,009,451} Result size of Simplifier iteration=3 = {terms: 1,518,511, types: 2,135,070, coercions: 1,008,419} Result size of Simplifier iteration=4 = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349} Result size of Simplifier = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349} *** Demand analysis: Result size of Demand analysis = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349} *** Worker Wrapper binds: Result size of Worker Wrapper binds = {terms: 1,638,656, types: 2,267,189, coercions: 1,016,585} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 1,581,976, types: 2,235,934, coercions: 1,015,794} Result size of Simplifier iteration=2 = {terms: 1,498,911, types: 2,130,083, coercions: 1,007,028} Result size of Simplifier iteration=3 = {terms: 1,498,641, types: 2,129,656, coercions: 1,007,547} Result size of Simplifier iteration=4 = {terms: 1,498,663, types: 2,129,564, coercions: 1,008,931} Result size of Simplifier = {terms: 1,498,663, types: 2,129,564, coercions: 1,008,931} *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}): Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}) = {terms: 1,574,915, types: 2,289,456, coercions: 1,008,931} *** Common sub-expression: Result size of Common sub-expression = {terms: 1,437,681, types: 2,110,200, coercions: 999,668} *** Float inwards: Result size of Float inwards = {terms: 1,437,681, types: 2,110,200, coercions: 999,668} *** Liberate case: Result size of Liberate case = {terms: 1,460,235, types: 2,122,207, coercions: 1,002,808} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 1,409,789, types: 2,018,248, coercions: 1,006,095} Result size of Simplifier iteration=2 = {terms: 1,409,904, types: 2,020,763, coercions: 1,004,750} Result size of Simplifier iteration=3 = {terms: 1,407,598, types: 2,016,738, coercions: 1,005,425} Result size of Simplifier iteration=4 = {terms: 1,407,068, types: 2,015,811, coercions: 1,003,311} Result size of Simplifier = {terms: 1,407,068, types: 2,015,811, coercions: 1,003,311} *** SpecConstr: Result size of SpecConstr = {terms: 2,875,513, types: 4,153,092, coercions: 1,868,890} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 2,811,029, types: 3,774,293, coercions: 1,781,476} Result size of Simplifier iteration=2 = {terms: 1,657,609, types: 2,285,271, coercions: 1,158,260} Result size of Simplifier iteration=3 = {terms: 1,634,476, types: 2,261,260, coercions: 1,111,721} Result size of Simplifier iteration=4 = {terms: 1,623,734, types: 2,229,084, coercions: 1,097,616} Result size of Simplifier = {terms: 1,623,734, types: 2,229,084, coercions: 1,097,616} *** Tidy Core: Result size of Tidy Core = {terms: 1,374,780, types: 1,868,771, coercions: 949,928} writeBinIface: 12038 Names writeBinIface: 14118 dict entries writeBinIface: 12038 Names writeBinIface: 14118 dict entries *** CorePrep: Result size of CorePrep = {terms: 1,742,444, types: 2,159,089, coercions: 949,848} *** Stg2Stg: *** CodeGen: *** Assembler: /usr/bin/gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -Itests/Tests -Idist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -Idist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -Idist/dist-sandbox-7565eaee/build/autogen -Idist/dist-sandbox- 7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -x assembler -c /tmp/ghc32343_0/ghc32343_13.s -o dist/dist-sandbox-7565eaee/build/vector- tests-O2/vector-tests-O2-tmp/Tests/Vector.o *** CorePrep: Result size of CorePrep = {terms: 1,742,444, types: 2,159,089, coercions: 949,848} }}} This is a substantial change from 7.8.3, which behaves as follows, = 7.8.3 = {{{ [5 of 6] Compiling Tests.Vector ( tests/Tests/Vector.hs, dist/dist- sandbox-8a24a56a/build/vector-tests-O2/vector-tests-O2-tmp/Tests/Vector.o ) *** Parser: *** Renamer/typechecker: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: tests/Tests/Vector.hs:405:5: Warning: Defined but not used: ‘limitUnfolds’ tests/Tests/Vector.hs:408:5: Warning: Defined but not used: ‘prop_unfoldr’ tests/Tests/Vector.hs:615:1: Warning: Defined but not used: ‘testBoolUnboxedVector’ *** Desugar: Result size of Desugar (after optimization) = {terms: 11,244, types: 57,883, coercions: 7,302} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 9,193, types: 44,105, coercions: 7,337} Result size of Simplifier iteration=2 = {terms: 9,116, types: 43,718, coercions: 7,159} Result size of Simplifier = {terms: 9,116, types: 43,718, coercions: 7,159} *** Specialise: Result size of Specialise = {terms: 82,833, types: 519,765, coercions: 78,989} *** Float out(FOS {Lam = Just 0, Consts = True, PAPs = False}): Result size of Float out(FOS {Lam = Just 0, Consts = True, PAPs = False}) = {terms: 100,958, types: 568,910, coercions: 78,989} *** Float inwards: Result size of Float inwards = {terms: 100,958, types: 568,910, coercions: 78,989} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 179,170, types: 343,254, coercions: 645,024} Result size of Simplifier iteration=2 = {terms: 123,469, types: 170,930, coercions: 259,665} Result size of Simplifier iteration=3 = {terms: 113,030, types: 155,318, coercions: 230,494} Result size of Simplifier iteration=4 = {terms: 112,745, types: 154,554, coercions: 221,711} Result size of Simplifier = {terms: 112,745, types: 154,554, coercions: 221,711} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 515,401, types: 809,314, coercions: 598,712} Result size of Simplifier iteration=2 = {terms: 345,742, types: 520,083, coercions: 376,036} Result size of Simplifier iteration=3 = {terms: 294,322, types: 416,385, coercions: 312,220} Result size of Simplifier iteration=4 = {terms: 287,445, types: 402,244, coercions: 297,516} Result size of Simplifier = {terms: 287,445, types: 402,244, coercions: 297,516} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 453,076, types: 567,575, coercions: 280,117} Result size of Simplifier iteration=2 = {terms: 321,815, types: 378,160, coercions: 222,556} Result size of Simplifier iteration=3 = {terms: 309,758, types: 365,364, coercions: 219,987} Result size of Simplifier iteration=4 = {terms: 308,628, types: 364,331, coercions: 219,583} Result size of Simplifier = {terms: 308,628, types: 364,331, coercions: 219,583} *** Demand analysis: Result size of Demand analysis = {terms: 308,628, types: 364,331, coercions: 219,583} *** Worker Wrapper binds: Result size of Worker Wrapper binds = {terms: 352,621, types: 419,343, coercions: 223,102} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 334,327, types: 396,352, coercions: 217,632} Result size of Simplifier iteration=2 = {terms: 297,776, types: 344,782, coercions: 211,762} Result size of Simplifier iteration=3 = {terms: 297,134, types: 343,498, coercions: 211,068} Result size of Simplifier iteration=4 = {terms: 296,791, types: 342,860, coercions: 211,033} Result size of Simplifier = {terms: 296,791, types: 342,860, coercions: 211,033} *** Float out(FOS {Lam = Just 0, Consts = True, PAPs = True}): Result size of Float out(FOS {Lam = Just 0, Consts = True, PAPs = True}) = {terms: 311,717, types: 380,250, coercions: 211,033} *** Common sub-expression: Result size of Common sub-expression = {terms: 281,729, types: 346,991, coercions: 196,315} *** Float inwards: Result size of Float inwards = {terms: 281,729, types: 346,991, coercions: 196,315} *** Liberate case: Result size of Liberate case = {terms: 301,329, types: 362,935, coercions: 200,458} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 281,187, types: 322,484, coercions: 200,278} Result size of Simplifier iteration=2 = {terms: 279,511, types: 320,001, coercions: 199,549} Result size of Simplifier = {terms: 279,511, types: 320,001, coercions: 199,549} *** SpecConstr: Result size of SpecConstr = {terms: 398,168, types: 420,420, coercions: 230,461} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 390,005, types: 411,465, coercions: 230,270} Result size of Simplifier iteration=2 = {terms: 272,588, types: 315,504, coercions: 201,634} Result size of Simplifier iteration=3 = {terms: 269,759, types: 310,458, coercions: 200,366} Result size of Simplifier = {terms: 269,759, types: 310,458, coercions: 200,366} *** Tidy Core: Result size of Tidy Core = {terms: 269,759, types: 310,458, coercions: 200,366} writeBinIface: 518 Names writeBinIface: 694 dict entries writeBinIface: 518 Names writeBinIface: 694 dict entries *** CorePrep: Result size of CorePrep = {terms: 341,307, types: 360,564, coercions: 200,286} }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by bgamari): This is also reproducible with 7.11. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by bgamari): I have cut down the `tests/Tests/Vector.hs` file from `vector` commit `e000d6c50e68f539a6f6cfa35fe91350a0691499` to something a bit more minimal that may still exhibit a similar slow-down. The result is the following, {{{#!hs module Tests.Vector (tests) where import Boilerplater import Utilities as Util import qualified Data.Vector.Generic as V import qualified Data.Vector import Test.QuickCheck import Test.Framework import Test.Framework.Providers.QuickCheck2 import Data.List #define COMMON_CONTEXT(a, v) \ VANILLA_CONTEXT(a, v), VECTOR_CONTEXT(a, v) #define VANILLA_CONTEXT(a, v) \ Eq a, Show a, Arbitrary a, CoArbitrary a, TestData a, Model a ~ a, EqTest a ~ Property #define VECTOR_CONTEXT(a, v) \ Eq (v a), Show (v a), Arbitrary (v a), CoArbitrary (v a), TestData (v a), Model (v a) ~ [a], EqTest (v a) ~ Property, V.Vector v a testTuplyFunctions:: forall a v. (COMMON_CONTEXT(a, v), VECTOR_CONTEXT((a, a), v), VECTOR_CONTEXT((a, a, a), v)) => v a -> [Test] testTuplyFunctions _ = $(testProperties ['prop_unzip3]) where prop_unzip3 :: P (v (a, a, a) -> (v a, v a, v a)) = V.unzip3 `eq` unzip3 tests = [ testGroup "Data.Vector.Vector (Bool)" (testTuplyFunctions (undefined :: Data.Vector.Vector Bool)) ] }}} GHC 7.10.1 takes around 30% longer than 7.8.3 with this simple example, with the program size peaking at 10000 terms during specialization, as opposed to 7.8 which never goes above 400 terms. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by bgamari): Well this potentially is interesting... `vector`'s test suite defines a type class `TestData`, {{{#!hs class (Testable (EqTest a), Conclusion (EqTest a)) => TestData a where type Model a model :: a -> Model a unmodel :: Model a -> a type EqTest a equal :: a -> a -> EqTest a }}} The `-ddump-simpl` output produced by 7.10 contains a variety of bindings referring to types containing `Model`, e.g., {{{#!hs lvl18 :: Model (Vector (Bool, Bool, Bool) -> (Vector Bool, Vector Bool, Vector Bool)) -> Vector (Bool, Bool, Bool) -> (Vector Bool, Vector Bool, Vector Bool) }}} whereas the Core produced by 7.8 contains no such references. I'm not sure whether this is problematic or not The simplified testcase above produces 6500 lines of Core in 7.10 and around 800 with 7.8. Despite the failure to resolve the `Model` type family, 7.10 seems to generally be doing more simplification. I believe much of the difference is due to the fact that 7.10 inlines the `$fTestData(,,)` dictionary whereas 7.8 does not. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by bgamari): `-ddump-inlinings` reveals that there are indeed 7.10 is doing a great deal more inlining than 7.8. Right off the bat 7.10 inlines a variety of methods from the `Applicative` et al. which never appear to be inlined under 7.8, {{{ Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4 Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4 Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4 Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4 Inlining done: GHC.Show.$fShow(,,)_$cshowList Inlining done: GHC.Show.$fShow(,,)_$cshow Inlining done: Test.QuickCheck.Arbitrary.$fArbitrary(,,)_$cshrink Inlining done: Test.QuickCheck.Arbitrary.$fArbitrary(,,)_$carbitrary Inlining done: Utilities.$fTestDataVector2_$cequal }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by thomie): Compiling `vector` master with HEAD using a devel2 build (which means a stage2 compiler built with `-DDEBUG`), results in a low of warnings such as the following: {{{ *** Core Lint warnings : in result of Simplifier *** libraries/vector/Data/Vector/Generic.hs:221:1: warning: [RHS of length :: forall (v_a3hG :: * -> *) a_a3hH. Vector v_a3hG a_a3hH => v_a3hG a_a3hH -> Int] INLINE binder is (non-rule) loop breaker: length libraries/vector/Data/Vector/Generic.hs:1891:1: warning: [RHS of unsafeCopy :: forall (v_a38Q :: * -> *) (m_a38R :: * -> *) a_a38S. (PrimMonad m_a38R, Vector v_a38Q a_a38S) => Mutable v_a38Q (PrimState m_a38R) a_a38S -> v_a38Q a_a38S -> m_a38R ()] INLINE binder is (non-rule) loop breaker: unsafeCopy }}} Maybe this gives a clue to what the problem is. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by thomie): * cc: dolio (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I have noticed that there is a qualitative difference in the verbose- core2core output between 7.8.4 and 7.10.3: namely 7.8 lacks an initial gentle simplifier pass. In the case of 7.10.3 the compiler inlines `Data.Vector.Generic.zip` in the `SimplMode {Phase = 2 [main], inline, rules, eta-expand, case-of-case}` pass, producing a significant amount of unresolved `stream`/`unstream` pairs relatively early in simplification. The last appearance of `unstream` doesn't disappear until `SimplMode {Phase = 0 [main], inline, rules, eta-expand, case-of-case}`. By contrast, 7.8.4 doesn't perform this inlining until much later (right before the second FloatOut; it's a bit hard to tell precisely since `verbose-core2core` doesn't output simplifier passes). As far as I can tell, it manages to collapse all of the stream fusion apparatus in a single simplifier pass. I believe this may be the reason for the difference. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by erikd): I have ghc 7.8.4, 7.10.3 and 8.0.0.20160316 installed on my machine. I cloned the vector git repo, set up a cabal sandbox and then for each ghc, `cabal install --dependencies-only`. Them for each ghc I did `cabal clean && time cabal build`. I've timed this with each compiler three times and this result is pretty representative: {{{ Compiler real user sys ---------------------------------------------------- 7.8.4 3m26.106s 4m42.508s 1m40.868s 7.10.3 4m32.690s 6m18.956s 3m6.416s 8.0.0.20160316 1m19.510s 1m52.796s 0m47.992s }}} It seems this partially fixed between 7.10.1 and 7.10.3 anyway, and 8.0 is far better than either of the old releases. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yay! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by erikd): From IRC: {{{ <bgamari> erikd, IIRC I have a more minimal testcase as well <bgamari> the problem centers around zip <bgamari> when things are inlined has changed <bgamari> which results in a large number of stream fusion primitives that can't be resolved until late in compilation <bgamari> which produces a large amount of code <bgamari> erikd, that should at least be a hint <bgamari> unfortunately I ran out of time before identifying the root cause <bgamari> erikd, https://gist.github.com/bgamari/2ca0f7d2a4e0b6447aee <bgamari> erikd, that's my minimized testcase <erikd> thanks, i'll check it out <bgamari> It helps to look at the -dverbose-core2core output <bgamari> comparing between the relevant compiler versions <bgamari> erikd, this utility may come in handy for turning the verbose-core2core output into something less unwieldy <bgamari> https://github.com/bgamari/ghc-utils/blob/master/split- core2core.py }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by erikd): Replying to [comment:13 simonpj]:
Yay!
Certainly great that the compile times have improved, but @Bgamari points out that the original problem was about compilng the test suite. More investigation still needed (tomorrow). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by erikd): Much like for [comment:12 comment 12], set myself up so I could build the vector test suite with ghc versions 7.8.4, 7.10.3 and 8.0.0.20160316 and then with each compiler ran `cabal clean && time cabal test` a number of times with each. This is the result of a single run, but is representative of this test: {{{ Compiler real user sys 7.8.4 1m38.167s 2m03.336s 0m43.800s 7.10.3 8m59.597s 11m40.756s 5m37.828s 8.0.0 0m45.165s 1m0.148s 0m23.744s }}} The 8.0 compiler has not only fixed the compile regression from 7.8 to 7.10, its actually about twice as fast as 7.8. Just as validation I also timed the runtime of the test suites compiled by each of the three compile and there didn't seem to be any measurable difference in run time performance. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Hmm, very interesting. Indeed it has been quite a while since I've checked this issue with 8.0. Judging by comment:5 it seems this was still reproducible, ||= Compiler =||= Real =||= user =||= sys =||= allocations =||= max residency =|| || 8.0.0.20160311 || 1m7.445s || 1m1.257s || 5.938s || || || 1m2.751s || 55.979s || 6.774s || || || 59.469s || 55.507s || 3.959s || || || 58.90s || 56.98s || 1.41s || 39.77e9 bytes || 248.65e6 bytes || || 7.8.4 || 2m2.411s || 1m58.572s || 3.429s || || || 2m3.170s || 2m0.194s || 3.027s || || || 2m2.185s || 2m0.342s || 1.834s || || || 2m1.44s || 1m59.81s || 1.61s || 106.07e9 bytes || 369.51e6 bytes || || 7.10.3 || 11m40.961s || 11m38.834s || 6.592s || || || 11m29.73s || 11m28.62s || 5.47s || 454.467e9 bytes || 70.714e6 bytes || -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * priority: highest => high * milestone: 8.0.1 => 8.0.2 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: closed Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 7.10.2 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: closed Priority: high | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.0.2 => 8.0.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: closed Priority: high | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dolio): Interesting. Do you by chance have 7.6.3 to compare with? 7.8 was already significantly worse than it. bgamari, is your max residency for 7.10 off? The figures for both 7.8 and 7.10 seem low to me, really. When I build the test suite on 7.10 I get like 11 GB max residency, and I'm not sure it would ever complete if I limited it to ~400 MB. Unless your test is a smaller case derived from the real test suite. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: closed Priority: high | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by erikd): I do have 7.6.3 installed, but it seems to have become broken since last time I tried. I think its actually a cabal issue. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10800: vector-0.11 compile time increased substantially with 7.10.1 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: closed Priority: high | Milestone: 8.0.1 Component: Compiler | Version: 7.10.2 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #13535 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * related: => #13535 Comment: While it's not clear whether it's related in cause, #13535 also reports a compile-time regression in `Tests.Vector`, this time in GHC 8.2. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10800#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC