
#10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth -------------------------------------+------------------------------------- Reporter: slyfox | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): (By the way, slyfox's numbers are for 7.10. GHC 7.8.4 ran for 3 times as long and produced 5 times as much code as 7.10.1 on `M_200_manual`. I wonder why?) So there is some interesting code generation problem here, where a program of size O(n) involving nested lambdas/closures incurs a quadratic code size cost, due to repeatedly copying all the free variables of one closure into the next closure. I'm not sure whether this problem is solvable in general (or, perhaps, whether it is solvable without building rather fancy data structures into the code generator). (That's not to say it's hard necessarily, I just haven't thought about it at all.) In this case, however, I think we can avoid the quadratic blowup in code size by restructuring the input code. Basically, since the values bound by `<-` are never used except to build the final result, just use the ApplicativeDo desugaring to avoid writing any lambdas at all. I don't have a GHC with ApplicativeDo handy yet so I manually rewrote slyfox's {{{ a1 <- reset readPrec; a2 <- comma_field_eq_read "a1"; a3 <- comma_field_eq_read "a2"; ... a201 <- comma_field_eq_read "a200"; expectP (Punc "}"); return (D a1 ...) }}} to {{{ res <- D <$> reset readPrec <*&> comma_field_eq_read "a1" <*&> comma_field_eq_read "a2" ... <*&> comma_field_eq_read "a200"; expectP (Punc "}"); return res }}} Here `<*&>` is just a NOINLINE copy of `<*>`. That's important, otherwise `<*>` will be inlined, creating a lambda and defeating the purpose of the rewrite. Now the final code size is {{{ text data bss dec hex filename 101839 24008 0 125847 1eb97 M_200_applicative.o }}} and reading through the assembly there isn't anything obviously quadratic left (just a linear number of small functions, and one linearly-sized function to build the result--which should just be the constructor for D anyways, and probably shouldn't really count against the size of the Read instance). Presumably factoring out the "`<*&> comma_field_eq_read`" and `unpackCString#` pattern would reduce code size a bit more. Unfortunately the compile time is still quadratic, because type-checking `D <$> x1 <*> x2 <*> ... <*> xn` results in a Core program with a quadratic number of types (`D` has type `(Int ->)^n -> D`, and then `D <$> x1` has type `(Int ->)^(n-1) -> f D`, and `D <$> x1 <*> x2` has type `(Int ->)^(n-2) -> f D`, etc.) It's still decently faster than building slyfox's manual version (about 2/3 the compile time). Obviously this may affect runtime performance too, but my position is that `deriving Read` is not intended to be (and is not in practice either) a high-performance parser, so I don't really care if it happens to get a bit slower. As long as `read` doesn't become asymptotically slower, which seems unlikely considering the generated code was originally asymptotically larger in size, I think it's fine. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler