
#9675: Unreasonable memory usage on large data structures -------------------------------------+------------------------------------- Reporter: Polarina | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 (amd64) Type of failure: Compile- | Difficulty: Unknown time performance bug | Blocked By: Test Case: | Related Tickets: Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata): Did some GHC profiling, and the problem is clearly the huge number of useless variables in the pattern match in the Core for the record selector functions, here for a three-field-record (I’ll spare us the 3000-field- record): {{{ Test.field3 :: Test.Foo -> GHC.Types.Int -> GHC.Types.Int [GblId[[RecSel]], Arity=1, Caf=NoCafRefs, Str=DmdType] Test.field3 = \ (ds_dm2 :: Test.Foo) -> case ds_dm2 of _ [Occ=Dead] { Test.Foo ds1_dm3 ds2_dm4 ds3_dm5 -> ds3_dm5 } }}} See the attached heap profile, `IdInfo`, `Var`, `IntMap` and `[]` are the biggest chunks. The problem is that a data type declaration like {{{ data R = R { field1 :: T1, field2 :: T2 } }}} generates Haskell code of the form {{{ field3 (R {field3 = x}) = x }}} which is then turned in to the form above by general code that also handles multiple fields, field puns, and further patterns in the field. While the submitter’s code above may or may not be reasonable it migth be worth improving the situation for the benefot of modules with medium-sized records (including GHC’s own DynFlags module). I see a two approaches: 1. Avoid generating actual Core for these selector. Make them implicitly availailbe, and only generated them in STG or later. I am not sure if there is precedence for such implicit things. 2. Extend the this code somehow: {{{ AltCon = DataAlt DataCon | LitAlt Literal | DEFAULT }}} I have two ideas here. Either special-case the single-field-match, i.e. a new constructor {{{ | FieldSelAlt DataCon Int }}} which binds only one variable, the field indicated by the `Int` variable. This would get rid of the quadraticness of the representation. But quite a bit of code would have to handle two cases, and there would be no unique representation of the same match. The other one is to extend `DataAlt` by a bit field: {{{ = DataAlt DataCon [Bool] }}} (with a more suitable data structure for `[Bool]`, e.g. some kind of `BitField`). The idea is that this field specifies what fields of the constructor are actually used by the match. The number of variables bound by such an alternative would equal to the number of `True`s in the list. This would improve every alternative where not all variables are used. It might make some code more complicated – I could imagine that some compiler phases expect that within a case alternative, there is a locally bound name for every field – but maybe it’s not too bad. This would still be quadratic (but with a much lower factor – one bit instead of whatever `Id` + `IdInfo` + `(:)` + whatnot use). We can even get rid of that by making the `BitField` a data type like {{{ data BitField = Empty | Singleton Int | ManyBits ByteString | All }}} (`Empty` would be useful for `Con {}` patterns. Not sure if `All` is useful. Maybe `data BitField = Empty | Singleton Int | All` is actually enough, if few-but-more-than-one-field-patterns are rare.) At this point I’m hoping for someone experienced (e.g. SPJ, of course) to tell us which of these ideas are totally whacky, and which are worth pursuing, and if so, what problems to expect. But this is not a promise that I’ll implement it :-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9675#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler