
#9675: Unreasonable memory usage on large data structures -------------------------------------+------------------------------------- Reporter: Polarina | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 (amd64) Type of failure: Compile- | Difficulty: Unknown time performance bug | Blocked By: Test Case: | Related Tickets: Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by simonpj): I agree that it is tantalising that so much `Core` is necessary to generate so little object code. Nevertheless, I am ''deeply'' reluctant to fiddle with the representation of `Core`. Doing so will impose overheads on every pass that deals with Core, and I'm far from convinced that it'll solve the problem. Anything along these lines would risk breaking optimisations. For example, here is how GHC optimises repeated evaluation of the same data constructor. Consider {{{ case x of K a _ _ _ -> ... (case x of K _ _ c _ -> ...a..c..) ... }}} Clearly we'd like to optimise this to {{{ case x of K a _ c _ -> ... ( ...a..c.. ) ... }}} And GHC does so by (effecitvely) transforming to {{{ case x of K a bb cc dd -> let x = K a bb cc dd in ... (case x of K _ _ c _ -> ...a..c..) ... }}} The `let` shadows the outer binding of `x`. Now the inner `case` can see that `x` is bound to `(K a bb cc dd)`, and so it's easy to do case-of- known-constructor to the inner case, giving {{{ case x of K a bb cc dd -> let x = K a bb cc dd in ... (let c = cc in ...a..c..) ... }}} (The `let` doesn't really exist, of course, it's only in the Simplifier's head.) All this is threatened if the outer case binds only `a`. I think a decent alternative approach would be not to generate code for selectors (at least when there are many fields) but rather to inline them at usage sites, via a so-called "compulsory unfolding". That too risks lots of code if you use a lot of record selectors, so it's not exactly a solid fix. The other thing is that I think it's quite likely that GHC has accumulated a space leak. The compiler makes repeated passes over the `Core` program. To avoid a leak, all vestiges of the input of each pass should be gone by the time we have computed the output of that pass. But I would not be amazed if that was not true at all -- that vesiges of pass 1 were still in memory when we were on pass 10. I would '''love''' someone to really look at this for us. It's not easy, but it could give a huge payoff for every GHC compilation, not just these weird corner cases. I certainly think we should be sure this is ''not'' happening before investing effort in optimising particular cases Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9675#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler