
#7258: Compiling DynFlags is jolly slow -------------------------------------+------------------------------------- Reporter: simonpj | Owner: simonpj Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.1 Resolution: | Keywords: deriving-perf Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Hmm. The `Cmm` for `g10` is actually a lot bigger than that for `f10`, and looked to me as if it too would scale quadratically. Are `f10` and `g10` accurate reflections of the "scale well" and "scale badly" versions of Read? I can see why things might scale badly, if we have a lot of {{{ p1 >>= \x -> p2 >>= \y -> p3 >>= \z -> return (K x y z) }}} And `f10` has that structure, albeit with data constructors rather than functions. BUT: if that's it, why doesn't `(>>=)` inline, so we get a case expression instead?? So if `f10` is reasonably accurate, I can see two ways forward: 1 Do something in the back end to generate less code. 2 Generate different code. I'm still intrigued (and a bit disbelieving) why a different exposition of Read generates code that scales better. Concerning (1) the kind of thing I had in mind was more like this {{{ f10 = A (\i0 -> A (\i1 -> A (\i2 -> A (\i3 -> A (\i4 -> let t = (i0,i1,i2,i3,i4) in A (\i5 -> A (\i6 -> A (\i7 -> A (\i8 -> A (\i9 -> case t of (i0,i1,i2,i3,i4) -> N (i0, i1, i2, i3, i4, i5, i6, i7, i8, i9) }}} Now no function closure gets more than 5 free variables. I imagine one could do this as a STG-to-STG transformation if we decided that this was really the core issue. A tantalising point is this. We have something like this: {{{ let f4 = [i0,i1,i2,i3] \i4 -> let f5 = [i0,i1,i2,i3, i4] \i5 -> ...blah... }}} The list before the `\` are the free vars of the closure. Inside the code for `f4` we have `f4`'s function closure in hand, with `i0...` in it. Rather than capturing `i0, i1...` as the free vars of f5, we could just store a pointer to `f4`'s function closure (it's a kind of tuple, after all) and fetch its free vars from there. It's as if we wanted to say {{{ let f4 = [i0,i1,i2,i3] \i4 -> let f5 = [f4, i4] \i5 -> ...blah... }}} with `f4` and `i4` being the free vars of `f5`'s closure. This would be a deeper change to the code generator, but it could make a lot of sense in cases where ''all'' of `f4`'s free variables are also free in `f5`. (If not all where, you might get a space leak by closing over `f4`.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/7258#comment:88 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler