
#7258: Compiling DynFlags is jolly slow -------------------------------------+------------------------------------- Reporter: simonpj | Owner: simonpj Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.1 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Replying to [comment:57 simonpj]:
read, read-appl and show "explode" while all the other examples, including getline-appl and getline, behave properly.
Very good!
{{{ sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 77.8 23.6 SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 9.6 30.0 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 3.0 10.4 }}}
What does `-dshow-passes` show about the program size? I.e. are we getting very large Core?
For a 100-line read example: {{{ Glasgow Haskell Compiler, Version 8.3.20171016, stage 2 booted by GHC version 8.0.1 Using binary package database: /home/tobias/well- typed/devel/ghc/inplace/lib/package.conf.d/package.cache package flags [] loading package database /home/tobias/well- typed/devel/ghc/inplace/lib/package.conf.d wired-in package ghc-prim mapped to ghc-prim-0.5.2.0 wired-in package integer-gmp mapped to integer-gmp-1.0.1.0 wired-in package base mapped to base-4.11.0.0 wired-in package rts mapped to rts wired-in package template-haskell mapped to template-haskell-2.13.0.0 wired-in package ghc mapped to ghc-8.3 wired-in package dph-seq not found. wired-in package dph-par not found. *** Checking old interface for D (use -ddump-hi-diffs for more details): *** Parser [D]: !!! Parser [D]: finished in 3.22 milliseconds, allocated 3.170 megabytes *** Renamer/typechecker [D]: !!! Renamer/typechecker [D]: finished in 124.33 milliseconds, allocated 72.473 megabytes *** Desugar [D]: Result size of Desugar (after optimization) = {terms: 2,667, types: 11,534, coercions: 0, joins: 0/0} !!! Desugar [D]: finished in 174.52 milliseconds, allocated 59.456 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 7,375, types: 18,990, coercions: 6,881, joins: 0/913} Result size of Simplifier iteration=2 = {terms: 3,722, types: 13,604, coercions: 1,351, joins: 0/198} Result size of Simplifier iteration=3 = {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1} Result size of Simplifier = {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1} !!! Simplifier [D]: finished in 1306.06 milliseconds, allocated 749.648 megabytes *** Specialise [D]: Result size of Specialise = {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1} !!! Specialise [D]: finished in 19.96 milliseconds, allocated 17.006 megabytes *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) [D]: Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 4,923, types: 15,504, coercions: 1,342, joins: 0/0} !!! Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) [D]: finished in 122.82 milliseconds, allocated 103.551 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 4,115, types: 13,888, coercions: 1,342, joins: 0/0} Result size of Simplifier = {terms: 4,115, types: 13,888, coercions: 1,342, joins: 0/0} !!! Simplifier [D]: finished in 216.40 milliseconds, allocated 122.592 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 4,115, types: 13,282, coercions: 1,342, joins: 0/0} Result size of Simplifier = {terms: 4,115, types: 13,282, coercions: 1,342, joins: 0/0} !!! Simplifier [D]: finished in 183.50 milliseconds, allocated 131.494 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0} Result size of Simplifier = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0} !!! Simplifier [D]: finished in 183.89 milliseconds, allocated 126.382 megabytes *** Float inwards [D]: Result size of Float inwards = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0} !!! Float inwards [D]: finished in 8.67 milliseconds, allocated 4.238 megabytes *** Called arity analysis [D]: Result size of Called arity analysis = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0} !!! Called arity analysis [D]: finished in 5.66 milliseconds, allocated 5.974 megabytes *** Simplifier [D]: Result size of Simplifier = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0} !!! Simplifier [D]: finished in 67.09 milliseconds, allocated 62.910 megabytes *** Demand analysis [D]: Result size of Demand analysis = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0} !!! Demand analysis [D]: finished in 170.27 milliseconds, allocated 91.193 megabytes *** Worker Wrapper binds [D]: Result size of Worker Wrapper binds = {terms: 4,541, types: 14,225, coercions: 1,342, joins: 0/3} !!! Worker Wrapper binds [D]: finished in 5.09 milliseconds, allocated 3.614 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0} Result size of Simplifier = {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0} !!! Simplifier [D]: finished in 351.93 milliseconds, allocated 219.389 megabytes *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}) [D]: Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}) = {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0} !!! Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}) [D]: finished in 268.13 milliseconds, allocated 280.555 megabytes *** Common sub-expression [D]: Result size of Common sub-expression = {terms: 3,740, types: 13,522, coercions: 1,342, joins: 0/0} !!! Common sub-expression [D]: finished in 38.50 milliseconds, allocated 48.593 megabytes *** Float inwards [D]: Result size of Float inwards = {terms: 3,740, types: 13,522, coercions: 1,342, joins: 0/0} !!! Float inwards [D]: finished in 5.37 milliseconds, allocated 3.369 megabytes *** Simplifier [D]: Result size of Simplifier = {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0} !!! Simplifier [D]: finished in 140.01 milliseconds, allocated 98.855 megabytes *** Demand analysis [D]: Result size of Demand analysis = {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0} !!! Demand analysis [D]: finished in 77.63 milliseconds, allocated 86.076 megabytes *** CoreTidy [D]: Result size of Tidy Core = {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0} !!! CoreTidy [D]: finished in 54.47 milliseconds, allocated 69.051 megabytes Created temporary directory: /tmp/ghc27776_0 *** CorePrep [D]: Result size of CorePrep = {terms: 3,569, types: 13,155, coercions: 1,342, joins: 0/204} !!! CorePrep [D]: finished in 25.73 milliseconds, allocated 32.097 megabytes *** Stg2Stg: *** CodeGen [D]: !!! CodeGen [D]: finished in 2109.26 milliseconds, allocated 1630.642 megabytes *** Assembler: *** Deleting temp files: Warning: deleting non-existent /tmp/ghc27776_0/ghc_2.c *** Deleting temp dirs: }}} At 200 lines, we're looking at: {{{ Glasgow Haskell Compiler, Version 8.3.20171016, stage 2 booted by GHC version 8.0.1 Using binary package database: /home/tobias/well- typed/devel/ghc/inplace/lib/package.conf.d/package.cache package flags [] loading package database /home/tobias/well- typed/devel/ghc/inplace/lib/package.conf.d wired-in package ghc-prim mapped to ghc-prim-0.5.2.0 wired-in package integer-gmp mapped to integer-gmp-1.0.1.0 wired-in package base mapped to base-4.11.0.0 wired-in package rts mapped to rts wired-in package template-haskell mapped to template-haskell-2.13.0.0 wired-in package ghc mapped to ghc-8.3 wired-in package dph-seq not found. wired-in package dph-par not found. *** Checking old interface for D (use -ddump-hi-diffs for more details): *** Parser [D]: !!! Parser [D]: finished in 3.80 milliseconds, allocated 6.207 megabytes *** Renamer/typechecker [D]: !!! Renamer/typechecker [D]: finished in 230.12 milliseconds, allocated 138.058 megabytes *** Desugar [D]: Result size of Desugar (after optimization) = {terms: 5,267, types: 43,034, coercions: 0, joins: 0/0} !!! Desugar [D]: finished in 289.66 milliseconds, allocated 201.767 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 14,575, types: 57,790, coercions: 13,681, joins: 0/1,813} Result size of Simplifier iteration=2 = {terms: 7,322, types: 47,104, coercions: 2,651, joins: 0/398} Result size of Simplifier iteration=3 = {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1} Result size of Simplifier = {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1} !!! Simplifier [D]: finished in 5652.80 milliseconds, allocated 2700.163 megabytes *** Specialise [D]: Result size of Specialise = {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1} !!! Specialise [D]: finished in 62.79 milliseconds, allocated 58.261 megabytes *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) [D]: Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 9,723, types: 50,904, coercions: 2,642, joins: 0/0} !!! Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) [D]: finished in 401.80 milliseconds, allocated 374.441 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 8,115, types: 47,688, coercions: 2,642, joins: 0/0} Result size of Simplifier = {terms: 8,115, types: 47,688, coercions: 2,642, joins: 0/0} !!! Simplifier [D]: finished in 754.47 milliseconds, allocated 393.159 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 8,115, types: 46,482, coercions: 2,642, joins: 0/0} Result size of Simplifier = {terms: 8,115, types: 46,482, coercions: 2,642, joins: 0/0} !!! Simplifier [D]: finished in 579.29 milliseconds, allocated 411.892 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0} Result size of Simplifier = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0} !!! Simplifier [D]: finished in 630.25 milliseconds, allocated 382.851 megabytes *** Float inwards [D]: Result size of Float inwards = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0} !!! Float inwards [D]: finished in 8.17 milliseconds, allocated 8.300 megabytes *** Called arity analysis [D]: Result size of Called arity analysis = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0} !!! Called arity analysis [D]: finished in 21.61 milliseconds, allocated 11.860 megabytes *** Simplifier [D]: Result size of Simplifier = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0} !!! Simplifier [D]: finished in 362.68 milliseconds, allocated 190.967 megabytes *** Demand analysis [D]: Result size of Demand analysis = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0} !!! Demand analysis [D]: finished in 551.36 milliseconds, allocated 339.529 megabytes *** Worker Wrapper binds [D]: Result size of Worker Wrapper binds = {terms: 8,941, types: 48,325, coercions: 2,642, joins: 0/3} !!! Worker Wrapper binds [D]: finished in 7.70 milliseconds, allocated 11.542 megabytes *** Simplifier [D]: Result size of Simplifier iteration=1 = {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0} Result size of Simplifier = {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0} !!! Simplifier [D]: finished in 1307.95 milliseconds, allocated 727.784 megabytes *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}) [D]: Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}) = {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0} !!! Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}) [D]: finished in 1393.21 milliseconds, allocated 1078.391 megabytes *** Common sub-expression [D]: Result size of Common sub-expression = {terms: 7,340, types: 46,922, coercions: 2,642, joins: 0/0} !!! Common sub-expression [D]: finished in 227.12 milliseconds, allocated 194.184 megabytes *** Float inwards [D]: Result size of Float inwards = {terms: 7,340, types: 46,922, coercions: 2,642, joins: 0/0} !!! Float inwards [D]: finished in 11.63 milliseconds, allocated 6.545 megabytes *** Simplifier [D]: Result size of Simplifier = {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0} !!! Simplifier [D]: finished in 590.60 milliseconds, allocated 349.746 megabytes *** Demand analysis [D]: Result size of Demand analysis = {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0} !!! Demand analysis [D]: finished in 485.19 milliseconds, allocated 330.909 megabytes *** CoreTidy [D]: Result size of Tidy Core = {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0} !!! CoreTidy [D]: finished in 293.13 milliseconds, allocated 258.898 megabytes Created temporary directory: /tmp/ghc18725_0 *** CorePrep [D]: Result size of CorePrep = {terms: 6,969, types: 46,155, coercions: 2,642, joins: 0/404} !!! CorePrep [D]: finished in 163.49 milliseconds, allocated 113.842 megabytes *** Stg2Stg: *** CodeGen [D]: !!! CodeGen [D]: finished in 23041.19 milliseconds, allocated 12665.144 megabytes *** Assembler: *** Deleting temp files: Warning: deleting non-existent /tmp/ghc18725_0/ghc_2.c *** Deleting temp dirs: }}} Not sure whether this qualifies as "unusually large". `-ddump-cmm` shows that the C-- for a 100-field read instance is more than twice as large as that for the 100-field getline example (39234 lines vs 14895), but size alone doesn't explain the huge difference. However, at 300 fields, the read example produces 237541 lines of C--, while getline grows roughly linearly, to 47312 lines, meaning that there is definitely some sort of nonlinearity going on wrt C-- code size.
So `sink` in C-- seems very slow. Nearly 4 bytes in 5 of all allocation is in tis pass alone. Is this a non-linear algorithmic effect. e.g if you double the size of the input, does the `sink` time go up non-linearly while `simplTopBinds` goes up linearly? That's my guess.
Yes, in fact it does, quite clearly so: 10-field example: {{{ Wed Oct 25 14:57 2017 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well- typed/devel/ghc/inplace/lib -B/home/tobias/well- typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-10-read.hs total time = 0.19 secs (188 ticks @ 1000 us, 1 processor) total alloc = 134,109,112 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 35.6 28.7 tc_rn_src_decls TcRnDriver compiler/typecheck/TcRnDriver.hs:(493,4)-(555,7) 9.0 9.0 OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 4.8 5.5 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 4.3 5.1 pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 3.7 3.5 tcRnImports TcRnDriver compiler/typecheck/TcRnDriver.hs:240:20-50 3.7 3.6 StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 3.7 2.1 sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 3.2 1.5 setSessionDynFlags GHC compiler/main/GHC.hs:(578,1)-(584,16) 3.2 4.7 NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 2.1 0.9 NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 2.1 2.9 bin_tycldecls HscTypes compiler/main/HscTypes.hs:1085:52-57 1.6 1.0 initGhcMonad GHC compiler/main/GHC.hs:(493,1)-(503,25) 1.6 4.7 MAIN MAIN <built-in> 1.1 0.3 fixStgRegisters AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:566:17-42 1.1 0.2 genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 1.1 1.3 shortcutBranches AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:694:17-54 1.1 0.2 elimCommonBlocks CmmPipeline compiler/cmm/CmmPipeline.hs:(71,13)-(72,76) 1.1 1.0 CAF PrimOp <entire-module> 1.1 0.5 CAF PrelInfo <entire-module> 1.1 1.0 deSugar HscMain compiler/main/HscMain.hs:511:7-44 1.1 0.9 CorePrep HscMain compiler/main/HscMain.hs:(1313,24)-(1314,57) 1.1 0.6 withCleanupSession GHC compiler/main/GHC.hs:(466,1)-(475,37) 1.1 0.8 regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 0.5 2.0 RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 0.5 2.4 layoutStack CmmPipeline compiler/cmm/CmmPipeline.hs:(97,13)-(99,40) 0.5 1.0 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 0.5 1.0 }}} 100 fields: {{{ Wed Oct 25 15:00 2017 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well- typed/devel/ghc/inplace/lib -B/home/tobias/well- typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-100-read.hs total time = 3.10 secs (3103 ticks @ 1000 us, 1 processor) total alloc = 2,397,379,664 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 38.9 36.6 sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 16.0 7.3 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 7.9 10.8 OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 4.3 5.8 pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 4.1 5.3 RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 3.8 4.6 StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 2.8 2.5 NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 2.8 4.8 regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 1.8 1.9 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 1.8 2.9 genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 1.8 1.6 tc_rn_src_decls TcRnDriver compiler/typecheck/TcRnDriver.hs:(493,4)-(555,7) 1.3 1.3 NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 1.0 1.7 CommonSubExpr SimplCore compiler/simplCore/SimplCore.hs:462:40-56 0.8 1.2 deSugar HscMain compiler/main/HscMain.hs:511:7-44 0.8 1.3 }}} 400 fields: {{{ Thu Oct 26 01:25 2017 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well- typed/devel/ghc/inplace/lib -B/home/tobias/well- typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-400-read.hs total time = 128.62 secs (128624 ticks @ 1000 us, 1 processor) total alloc = 37,659,351,320 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 77.8 23.6 SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 9.6 30.0 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 3.0 10.4 RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 1.6 5.0 OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 1.1 4.2 pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 1.0 4.7 NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 0.9 4.2 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 0.7 2.9 StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 0.6 1.9 regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 0.5 1.4 genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 0.4 1.3 NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 0.4 1.6 CommonSubExpr SimplCore compiler/simplCore/SimplCore.hs:462:40-56 0.2 1.1 deSugar HscMain compiler/main/HscMain.hs:511:7-44 0.2 1.1 }}}
(Related question: does `sink` show up a a big item when compiling "normal" programs?)
It does, but nowhere near this dominantly. E.g. getline, 400 fields: {{{ Thu Oct 26 02:54 2017 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well- typed/devel/ghc/inplace/lib -B/home/tobias/well- typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-400-getline.hs total time = 14.13 secs (14126 ticks @ 1000 us, 1 processor) total alloc = 12,479,030,504 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 31.7 3.3 SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 24.7 39.0 OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 7.2 9.7 NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 6.5 10.4 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 5.2 7.6 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 4.1 6.2 StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 2.0 1.6 deSugar HscMain compiler/main/HscMain.hs:511:7-44 2.0 3.3 CommonSubExpr SimplCore compiler/simplCore/SimplCore.hs:462:40-56 1.9 3.2 CorePrep HscMain compiler/main/HscMain.hs:(1313,24)-(1314,57) 1.4 2.5 Specialise SimplCore compiler/simplCore/SimplCore.hs:486:40-50 0.8 1.3 StgCse SimplStg compiler/simplStg/SimplStg.hs:(88,14)-(91,44) 0.5 1.3 }}}
Guessing: the "shape" of the C-- code makes `sink` behave very badly. Can you characterise what that shape is, and where the algorithmic badness comes from?
Not yet, but I have a hunch that it's simply a matter of producing too much of it. Maybe there is quadratic behavior there, where the number of lines of C-- grows proportionally with the square of the number of fields? That seems like a plausible explanation so far. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/7258#comment:58 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler