Re: [GHC] #10980: Deriving Read instance from datatype with N fields leads to N^2 code size growth

26 Oct 2015

      #10980: Deriving Read instance from datatype with N fields leads to N^2 code size
growth
-------------------------------------+-------------------------------------
        Reporter:  slyfox            |                Owner:
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.10.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by rwbarton):

 (By the way, slyfox's numbers are for 7.10. GHC 7.8.4 ran for 3 times as
 long and produced 5 times as much code as 7.10.1 on `M_200_manual`. I
 wonder why?)

 So there is some interesting code generation problem here, where a program
 of size O(n) involving nested lambdas/closures incurs a quadratic code
 size cost, due to repeatedly copying all the free variables of one closure
 into the next closure. I'm not sure whether this problem is solvable in
 general (or, perhaps, whether it is solvable without building rather fancy
 data structures into the code generator). (That's not to say it's hard
 necessarily, I just haven't thought about it at all.)

 In this case, however, I think we can avoid the quadratic blowup in code
 size by restructuring the input code. Basically, since the values bound by
 `<-` are never used except to build the final result, just use the
 ApplicativeDo desugaring to avoid writing any lambdas at all. I don't have
 a GHC with ApplicativeDo handy yet so I manually rewrote slyfox's
 {{{
                    a1 <- reset readPrec;
                    a2 <- comma_field_eq_read "a1";
                    a3 <- comma_field_eq_read "a2";
                    ...
                    a201 <- comma_field_eq_read "a200";
                    expectP (Punc "}");
                    return
                      (D a1
                       ...)
 }}}
 to
 {{{
                    res <- D <$> reset readPrec
                           <*&> comma_field_eq_read "a1"
                           <*&> comma_field_eq_read "a2"
                           ...
                           <*&> comma_field_eq_read "a200";
                    expectP (Punc "}");
                    return res
 }}}
 Here `<*&>` is just a NOINLINE copy of `<*>`. That's important, otherwise
 `<*>` will be inlined, creating a lambda and defeating the purpose of the
 rewrite.

 Now the final code size is
 {{{
    text    data     bss     dec     hex filename
  101839   24008       0  125847   1eb97 M_200_applicative.o
 }}}
 and reading through the assembly there isn't anything obviously quadratic
 left (just a linear number of small functions, and one linearly-sized
 function to build the result--which should just be the constructor for D
 anyways, and probably shouldn't really count against the size of the Read
 instance). Presumably factoring out the "`<*&> comma_field_eq_read`" and
 `unpackCString#` pattern would reduce code size a bit more.

 Unfortunately the compile time is still quadratic, because type-checking
 `D <$> x1 <*> x2 <*> ... <*> xn` results in a Core program with a
 quadratic number of types (`D` has type `(Int ->)^n -> D`, and then `D <$>
 x1` has type `(Int ->)^(n-1) -> f D`, and `D <$> x1 <*> x2` has type `(Int
 ->)^(n-2) -> f D`, etc.) It's still decently faster than building slyfox's
 manual version (about 2/3 the compile time).

 Obviously this may affect runtime performance too, but my position is that
 `deriving Read` is not intended to be (and is not in practice either) a
 high-performance parser, so I don't really care if it happens to get a bit
 slower. As long as `read` doesn't become asymptotically slower, which
 seems unlikely considering the generated code was originally
 asymptotically larger in size, I think it's fine.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10980#comment:7
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler