[GHC] #10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- The problem observed on GHC's code base when explored why exactly some object files are so huge. Let's look at the stats of text code section sizes on today's HEAD (skipping GHCi libraries and stage1 binaries): {{{ ~/dev/git/ghc $ size `find -name *.o -type f | grep -v stage1 | grep -v HS` | sort -k1 -n -r | head -n20 text data bss dec hex filename 1791068 145448 0 1936516 1d8c84 ./compiler/stage2/build/DynFlags.o 1558637 77464 0 1636101 18f705 ./compiler/stage2/build/PprC.o 1397966 23272 0 1421238 15afb6 ./compiler/stage2/build/PlatformConstants.o 1332048 373344 0 1705392 1a05b0 ./libraries/Cabal/Cabal/dist- boot/build/Distribution/PackageDescription.o 1331238 373352 0 1704590 1a028e ./bootstrapping/Distribution/PackageDescription.o 1271578 246240 0 1517818 1728fa ./libraries/template-haskell/dist- boot/build/Language/Haskell/TH/Syntax.o 1229696 311616 0 1541312 1784c0 ./libraries/Cabal/Cabal/dist- install/build/Distribution/PackageDescription.o 1215082 224288 0 1439370 15f68a ./libraries/template-haskell/dist- install/build/Language/Haskell/TH/Syntax.o 1071746 242664 0 1314410 140e6a ./compiler/stage2/build/HsExpr.o 1015090 40904 0 1055994 101cfa ./compiler/stage2/build/Llvm/PpLlvm.o 970203 124944 0 1095147 10b5eb ./compiler/stage2/build/Parser.o 849693 79760 0 929453 e2ead ./compiler/stage2/build/HsDecls.o 833327 35440 0 868767 d419f ./compiler/stage2/build/X86/CodeGen.o 819959 127192 0 947151 e73cf ./libraries/Cabal/Cabal/dist- boot/build/Distribution/Simple/Setup.o 819685 125120 0 944805 e6aa5 ./bootstrapping/Distribution/Simple/Setup.o 816927 124520 0 941447 e5d87 ./libraries/Cabal/Cabal/dist- install/build/Distribution/Simple/Setup.o 785398 41536 0 826934 c9e36 ./compiler/stage2/build/CoreLint.o 772550 42040 0 814590 c6dfe ./compiler/stage2/build/DriverPipeline.o 766461 36280 0 802741 c3fb5 ./compiler/stage2/build/HscMain.o 735605 23408 0 759013 b94e5 ./libraries/containers/dist- install/build/Data/Sequence.o }}} '''PlatformConstants.o''' is a very clear example of this problem. Trimmed down example of this file is: {{{#!hs module M (D) where data D = D { a0 , a01 , a02 , a03 , a04 , a05 , a06 , a07 , a08 , a09 , a10 , a11 , a12 , a13 , a14 , a15 , a16 , a17 , a18 , a19 , a20 , a21 , a22 , a23 , a24 , a25 , a26 , a27 , a28 , a29 , a30 , a31 , a32 , a33 , a34 , a35 , a36 , a37 , a38 , a39 , a40 , a41 , a42 , a43 , a44 , a45 , a46 , a47 , a48 , a49 , a50 , a51 , a52 , a53 , a54 , a55 , a56 , a57 , a58 , a59 , a60 , a61 , a62 , a63 , a64 , a65 , a66 , a67 , a68 , a69 , a70 , a71 , a72 , a73 , a74 , a75 , a76 , a77 , a78 , a79 , a80 , a81 , a82 , a83 , a84 , a85 , a86 , a87 , a88 , a89 , a90 , a91 , a92 , a93 , a94 , a95 , a96 , a97 , a98 , a99 :: Int } deriving Read }}} Results in 800KB file: {{{ $ ghc-stage2 -c -O1 -fforce-recomp M.hs -o M.o $ size M.o text data bss dec hex filename 847039 16080 0 863119 d2b8f M.o }}} The size growth is quadratic: {{{ # fields code-size 1 6263 21 61767 41 173623 61 347503 81 583367 101 881231 121 1241087 141 1662959 161 2146815 181 2692679 201 3300543 221 3970407 241 4702263 261 5496135 281 6351991 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by slyfox): * Attachment "mk_data.bash" added. script to build temp files and stats -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by slyfox): * Attachment "size-growth.log.png" added. graph for 1 to 281 field in a simple datatype. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by slyfox): Derived instance looks reasonably well (i've picked a data type with 282 fields): {{{#!hs {- $ ghc -O1 -fforce-recomp -ddump-deriv M.hs [1 of 1] Compiling M ( M.hs, M.o ) ==================== Derived instances ==================== Derived instances: -} instance GHC.Read.Read M.D where GHC.Read.readPrec = GHC.Read.parens (Text.ParserCombinators.ReadPrec.prec 11 (do { GHC.Read.expectP (Text.Read.Lex.Ident "D"); GHC.Read.expectP (Text.Read.Lex.Punc "{"); GHC.Read.expectP (Text.Read.Lex.Ident "a0"); GHC.Read.expectP (Text.Read.Lex.Punc "="); a1_a1ns <- Text.ParserCombinators.ReadPrec.reset GHC.Read.readPrec; GHC.Read.expectP (Text.Read.Lex.Punc ","); GHC.Read.expectP (Text.Read.Lex.Ident "a1"); GHC.Read.expectP (Text.Read.Lex.Punc "="); a2_a1nt <- Text.ParserCombinators.ReadPrec.reset GHC.Read.readPrec; GHC.Read.expectP (Text.Read.Lex.Punc ","); GHC.Read.expectP (Text.Read.Lex.Ident "a2"); GHC.Read.expectP (Text.Read.Lex.Punc "="); a3_a1nu <- Text.ParserCombinators.ReadPrec.reset GHC.Read.readPrec; -- ... GHC.Read.expectP (Text.Read.Lex.Punc ","); GHC.Read.expectP (Text.Read.Lex.Ident "a280"); GHC.Read.expectP (Text.Read.Lex.Punc "="); a281_a1rY <- Text.ParserCombinators.ReadPrec.reset GHC.Read.readPrec; GHC.Read.expectP (Text.Read.Lex.Punc ","); GHC.Read.expectP (Text.Read.Lex.Ident "a281"); GHC.Read.expectP (Text.Read.Lex.Punc "="); a282_a1rZ <- Text.ParserCombinators.ReadPrec.reset GHC.Read.readPrec; GHC.Read.expectP (Text.Read.Lex.Punc "}"); GHC.Base.return (M.D a1_a1ns a2_a1nt a3_a1nu a4_a1nv a5_a1nw -- ... a280_a1rX a281_a1rY a282_a1rZ) })) GHC.Read.readList = GHC.Read.readListDefault GHC.Read.readListPrec = GHC.Read.readListPrecDefault }}} Generated assembly is full of stack rearrangements which seems to be a main source of code inflation and compilation slowdown. Some thoughts: - Could GHC do a CSE for code sequences of '''expetP (Punc ); expetP (Ident ); expetP (Punc );'''? - If all fields were strict could GHC optimise stack layout better? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by slyfox): * cc: simonmar, simonpj, ezyang (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by slyfox): I've tried to manually move out repeated pieces {{{#!hs GHC.Read.expectP (Text.Read.Lex.Punc ","); GHC.Read.expectP (Text.Read.Lex.Ident "a2"); GHC.Read.expectP (Text.Read.Lex.Punc "="); a3_a1nu <- Text.ParserCombinators.ReadPrec.reset GHC.Read.readPrec; }}} to a separate binding parameterised by Ident and it seems to be enough to make code growth linear with added fields. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by mpickering): * cc: mpickering (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Slyfox, can you give a small example of what the code looks like before and after your change? I can see that turning 4 lines into 1 (by creating a new function definition) is good, but at best that makes it 4 times smaller, which is not the difference between linear and quadratic. I'd like to understand where the quadratic-ness is coming from! Thanks for investigating this. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by slyfox): Replying to [comment:5 simonpj]:
Slyfox, can you give a small example of what the code looks like before and after your change? I can see that turning 4 lines into 1 (by creating a new function definition) is good, but at best that makes it 4 times smaller, which is not the difference between linear and quadratic.
I've rechecked the measurement and you are correct! Quadratic behaviour stil presents in manually converted sources. Seems I was looking at the wrong binary files when tweaking things. Apologies for the confusion :( (for completness) attaching source file before and after transformation for 200-entries file: {{{ $ ghc -c -fforce-recomp -c -O1 M_200_auto.hs $ ghc -c -fforce-recomp -c -O1 M_200_manual.hs $ size M_200_auto.o M_200_manual.o text data bss dec hex filename 3268679 41520 0 3310199 328277 M_200_auto.o 674695 16424 0 691119 a8baf M_200_manual.o }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by slyfox): * Attachment "M_200_auto.hs" added. by ghc derived as-is -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by slyfox): * Attachment "M_200_manual.hs" added. factored out repetition -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): (By the way, slyfox's numbers are for 7.10. GHC 7.8.4 ran for 3 times as long and produced 5 times as much code as 7.10.1 on `M_200_manual`. I wonder why?) So there is some interesting code generation problem here, where a program of size O(n) involving nested lambdas/closures incurs a quadratic code size cost, due to repeatedly copying all the free variables of one closure into the next closure. I'm not sure whether this problem is solvable in general (or, perhaps, whether it is solvable without building rather fancy data structures into the code generator). (That's not to say it's hard necessarily, I just haven't thought about it at all.) In this case, however, I think we can avoid the quadratic blowup in code size by restructuring the input code. Basically, since the values bound by `<-` are never used except to build the final result, just use the ApplicativeDo desugaring to avoid writing any lambdas at all. I don't have a GHC with ApplicativeDo handy yet so I manually rewrote slyfox's {{{ a1 <- reset readPrec; a2 <- comma_field_eq_read "a1"; a3 <- comma_field_eq_read "a2"; ... a201 <- comma_field_eq_read "a200"; expectP (Punc "}"); return (D a1 ...) }}} to {{{ res <- D <$> reset readPrec <*&> comma_field_eq_read "a1" <*&> comma_field_eq_read "a2" ... <*&> comma_field_eq_read "a200"; expectP (Punc "}"); return res }}} Here `<*&>` is just a NOINLINE copy of `<*>`. That's important, otherwise `<*>` will be inlined, creating a lambda and defeating the purpose of the rewrite. Now the final code size is {{{ text data bss dec hex filename 101839 24008 0 125847 1eb97 M_200_applicative.o }}} and reading through the assembly there isn't anything obviously quadratic left (just a linear number of small functions, and one linearly-sized function to build the result--which should just be the constructor for D anyways, and probably shouldn't really count against the size of the Read instance). Presumably factoring out the "`<*&> comma_field_eq_read`" and `unpackCString#` pattern would reduce code size a bit more. Unfortunately the compile time is still quadratic, because type-checking `D <$> x1 <*> x2 <*> ... <*> xn` results in a Core program with a quadratic number of types (`D` has type `(Int ->)^n -> D`, and then `D <$> x1` has type `(Int ->)^(n-1) -> f D`, and `D <$> x1 <*> x2` has type `(Int ->)^(n-2) -> f D`, etc.) It's still decently faster than building slyfox's manual version (about 2/3 the compile time). Obviously this may affect runtime performance too, but my position is that `deriving Read` is not intended to be (and is not in practice either) a high-performance parser, so I don't really care if it happens to get a bit slower. As long as `read` doesn't become asymptotically slower, which seems unlikely considering the generated code was originally asymptotically larger in size, I think it's fine. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Reid, interesting! So you think it's all in the code generation and closure construction. Can you give a simple test case, separate from `deriving Read` that demonstrates the phenomenon, with the monadic and applicative versions side by side, but with very different perf? That would help to focus our attention. Thanks Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by rwbarton): * failure: None/Unknown => Compile-time performance bug -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by ezyang): * keywords: => deriving-perf -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * cc: RyanGlScott (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Here's a simple example of the quadratic code size I was talking about in comment:7, not specific to Read or Applicative or Monad. {{{#!hs f a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10 }}} At the Core and STG stages the size of `f` is linear in the number of arguments. But the Cmm for `f` builds a thunk for the first argument `a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9` and then calls `+`. In order to build that thunk it has to copy `a1` through `a9` into the thunk, as they are its free variables. Then the code for that thunk is going to build another thunk for `a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8`, which requires copying `a1` through `a8` again, and so on. The total code size, work done and allocations are all quadratic in the number of arguments. In this particular case it would clearly be better to do all the thunk construction up front in `f`, like (pseudo-STG) {{{#!hs f a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 = let t2 = a1 + a2 t3 = t2 + a3 t4 = t3 + a4 ... t9 = t8 + a9 in t9 + a10 }}} which would be linear code size, work and allocations in the number of arguments. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): However, the quadratic behavior exhibited by the Read instance is more similar to this example {{{#!hs f :: (Int -> a) -> a f k = k 0 {-# NOINLINE f #-} v :: (Int,Int,Int,Int,Int,Int,Int,Int,Int,Int) v = f (\a1 -> f (\a2 -> f (\a3 -> f (\a4 -> f (\a5 -> f (\a6 -> f (\a7 -> f (\a8 -> f (\a9 -> f (\a10 -> (a1,a2,a3,a4,a5,a6,a7,a8,a9,a10))))))))))) }}} `v` is supposed to correspond to the desugaring of the long `do` blocks in slyfox's attachments. Here the quadratic blowup at the Cmm stage is caused by the fact that the free variables of the nth lambda are all the preceding `a1`, ..., `an`. We have to copy `a1` through `an` into the (n+1)st lambda that we provide as an argument to the next `f`. This is the blowup that I said I didn't know how to solve in the code generator in comment:7. However, (I claimed that) in this case we can avoid generating such nested lambdas in the first place by changing the desugaring to use Applicative methods rather than Monad ones. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): (In reply to comment:12) Is this a specific `+`, or just the overloaded method from `Num`? I guess the latter, as a strict `+` would yield quite different code. So you are proposing to add a transformation to STG (or Core) that replaces {{{ let t1 = let t2 = e1 in e2 in e3 }}} with {{{ let t2 = e1 t1 = e2 in e3 }}} Can we give an easy criterion when this transformation is beneficial? The problem is that this will increase allocation in the case when `t1` is not called, as it is more space efficient to pack the union of the free variables of `e1` and `e2` into one closure, instead of having one for the free variables of `e1` and one for those of `e2`. We could say “Do this transformation if the free variables of e2 and e1” are disjoint. Then we’d allocate two extra words (one for the info table of `t2`, and one pointer to that in the closure for `t1`), but have some nice gains if `t1` is indeed called. But this rule would not fire in this case, because the `Num a` dictionary is also a free variable, which would now have to be copied into both closures! I guess one could try and see whether real-world programs benefit from this transformation, and how many shared free variables between `e1` and `e2` are ok. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Replying to [comment:14 nomeata]:
(In reply to comment:12) Is this a specific `+`, or just the overloaded method from `Num`? I guess the latter, as a strict `+` would yield quite different code.
Right, I meant the latter. It didn't occur to me that this behavior would only occur when `(+)` is not known by GHC to be strict; I thought it would be enough for GHC to have to perform the call to `(+)`, i.e., not inline it.
So you are proposing to add a transformation to STG (or Core) that replaces {{{ let t1 = let t2 = e1 in e2 in e3 }}} with {{{ let t2 = e1 t1 = e2 in e3 }}}
Can we give an easy criterion when this transformation is beneficial?
The problem is that this will increase allocation in the case when `t1` is not called, as it is more space efficient to pack the union of the free variables of `e1` and `e2` into one closure, instead of having one for the free variables of `e1` and one for those of `e2`.
We could say “Do this transformation if the free variables of e2 and e1” are disjoint. Then we’d allocate two extra words (one for the info table of `t2`, and one pointer to that in the closure for `t1`), but have some nice gains if `t1` is indeed called.
But this rule would not fire in this case, because the `Num a` dictionary is also a free variable, which would now have to be copied into both closures!
I guess one could try and see whether real-world programs benefit from
And you're right that it's only a gain when `(+)` is actually going to evaluate `t1`, and a loss when it is not going to. In this case the gain is asymptotic and the loss is a constant factor, but if there were a lot of subexpressions and few free variables then the reverse could be true. this transformation, and how many shared free variables between `e1` and `e2` are ok. Perhaps this discussion should move to a new ticket, since I remembered it is actually not the issue with the derived Read instance after all? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
Perhaps this discussion should move to a new ticket, since I remembered it is actually not the issue with the derived Read instance after all?
Sure, #13213 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nelhage): * cc: nelhage (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #13213 #14364 | Differential Rev(s): #7258 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * related: => #13213 #14364 #7258 Comment: See also: #14364, which tries to knock a factor off of the code size of derived `Read` instances. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC