[GHC] #10359: Tuple constraint synonym led to asymptotic performance lossage

#10359: Tuple constraint synonym led to asymptotic performance lossage -------------------------------------+------------------------------------- Reporter: axch | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Operating System: Linux Architecture: x86_64 | Type of failure: Runtime (amd64) | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Revisions: | -------------------------------------+------------------------------------- I encountered a situation where using a tuple constraint synonym produced an asymptotic runtime performance degradation, from linear to quadratic time. The enclosed Bug.hs reproduces the potentially relevant features of the problem. The function `test` there is quadratic, but should be linear. The file also contains a definition of `test2`, which differs only in eschewing the constraint synonym, and is linear, as it should be. In more detail: the program tries to count to 20,000 in a rather contrived way. It maintains a Box, which holds the current count and a function that can accept an increment and a count to produce a new count (namely, `(+)`). The function `do_step` lifts incrementation to Boxes, using their contained function to update their stored count. The first tricky thing is that the incrementing function stored in the Box is polymorphic in both arguments separately: it can accept increments that are not the same type as the stored count (and converts them). The second tricky thing is that I don't want to expose the type variable for the increment type (as opposed to the count type) to clients of the Box, so the Box's incrementing function is stored polymorphic (with an explicit forall). Finally, I want to impose the constraint `(Fractional num, Real num)` on allowable increments (even though this example only needs `Real num`). This constellation of desiderata conspires to produce quadratic performance: doubling the parameter to `test` (but not `test2`) multiplies its runtime by 4. Inspecting the heap profile indicates a growing accumulation of closures when running `test` (but not `test2`). Inspecting the generated stg (enclosed) locates a potential culprit: apparently when `do_step` (but not `do_step2`) reconstructs the Box, it changes the `func` stored there to be a fresh closure that destructures the `Numerical` constraint tuple, constructs a fresh one, and calls the function that was in that slot previously with it. I hypothesize that this accumulates as a chain that performs a linear number of such destructurings and restructurings at each increment, leading to the observed quadratic runtime and linear memory growth. Incidentally, I checked whether record wildcards were the issue (no: `test4`) and whether updating just the `obj` field solves it (yes: `test3`). Sadly, the latter solution is not directly usable in my actual application because of "Record update for insufficiently polymorphic field". -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10359 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10359: Tuple constraint synonym led to asymptotic performance lossage -------------------------------------+------------------------------------- Reporter: axch | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Description changed by axch: Old description:
I encountered a situation where using a tuple constraint synonym produced an asymptotic runtime performance degradation, from linear to quadratic time. The enclosed Bug.hs reproduces the potentially relevant features of the problem. The function `test` there is quadratic, but should be linear. The file also contains a definition of `test2`, which differs only in eschewing the constraint synonym, and is linear, as it should be.
In more detail: the program tries to count to 20,000 in a rather contrived way. It maintains a Box, which holds the current count and a function that can accept an increment and a count to produce a new count (namely, `(+)`). The function `do_step` lifts incrementation to Boxes, using their contained function to update their stored count.
The first tricky thing is that the incrementing function stored in the Box is polymorphic in both arguments separately: it can accept increments that are not the same type as the stored count (and converts them). The second tricky thing is that I don't want to expose the type variable for the increment type (as opposed to the count type) to clients of the Box, so the Box's incrementing function is stored polymorphic (with an explicit forall). Finally, I want to impose the constraint `(Fractional num, Real num)` on allowable increments (even though this example only needs `Real num`).
This constellation of desiderata conspires to produce quadratic performance: doubling the parameter to `test` (but not `test2`) multiplies its runtime by 4. Inspecting the heap profile indicates a growing accumulation of closures when running `test` (but not `test2`). Inspecting the generated stg (enclosed) locates a potential culprit: apparently when `do_step` (but not `do_step2`) reconstructs the Box, it changes the `func` stored there to be a fresh closure that destructures the `Numerical` constraint tuple, constructs a fresh one, and calls the function that was in that slot previously with it. I hypothesize that this accumulates as a chain that performs a linear number of such destructurings and restructurings at each increment, leading to the observed quadratic runtime and linear memory growth.
Incidentally, I checked whether record wildcards were the issue (no: `test4`) and whether updating just the `obj` field solves it (yes: `test3`). Sadly, the latter solution is not directly usable in my actual application because of "Record update for insufficiently polymorphic field".
New description: I encountered a situation where using a tuple constraint synonym produced an asymptotic runtime performance degradation, from linear to quadratic time. The enclosed Bug.hs reproduces the potentially relevant features of the problem. The function `test` there is quadratic, but should be linear. The file also contains a definition of `test2`, which differs only in eschewing the constraint synonym, and is linear, as it should be. In more detail: the program tries to count to 20,000 in a rather contrived way. It maintains a Box, which holds the current count and a function that can accept an increment and a count to produce a new count (namely, `(+)`). The function `do_step` lifts incrementation to Boxes, using their contained function to update their stored count. The first tricky thing is that the incrementing function stored in the Box is polymorphic in both arguments separately: it can accept increments that are not the same type as the stored count (and converts them). The second tricky thing is that I don't want to expose the type variable for the increment type (as opposed to the count type) to clients of the Box, so the Box's incrementing function is stored polymorphic (with an explicit forall). Finally, I want to impose the constraint `(Fractional num, Real num)` on allowable increments (even though this example only needs `Real num`). This constellation of desiderata conspires to produce quadratic performance: doubling the parameter to `test` (but not `test2`) multiplies its runtime by 4. Inspecting the heap profile indicates a growing accumulation of closures when running `test` (but not `test2`). Inspecting the generated stg (enclosed) locates a potential culprit: apparently when `do_step` (but not `do_step2`) reconstructs the Box, it changes the `func` stored there to be a fresh closure that destructures the `Numerical` constraint tuple, constructs a fresh one, and calls the function that was in that slot previously with it. I hypothesize that this accumulates as a chain that performs a linear number of such destructurings and restructurings at each increment, leading to the observed quadratic runtime and linear memory growth. Incidentally, I checked whether record wildcards were the issue (no: `test4`) and whether updating just the `obj` field solves it (yes: `test3`). Sadly, the latter solution is not directly usable in my actual application because of "Record update for insufficiently polymorphic field". For reference: I compiled with `ghc -O2 Bug.hs -fprof-auto -rtsopts -ddump-to-file -ddump-simpl -ddump-st` -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10359#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10359: Tuple constraint synonym led to asymptotic performance lossage -------------------------------------+------------------------------------- Reporter: axch | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): First of all, very nice example. Thank you for making it so small and easy to work with. I can see what's happening. The key part is what happens here: {{{ do_step4 :: (Numerical num) => num -> Box a -> Box a do_step4 number Box{ func = f, obj = x} = Box{ func = f, obj = f number x } }}} After elaboration (ie making dictionaries explicit) we get this: {{{ do_step4 dn1 number (Box {func = f, obj = x }) = Box { func = \dn2 -> f ( case dn2 of (f,r) -> f , case dn2 of (f,r) -> r) , obj = f dn1 number x } }}} That's odd! We expected this: {{{ do_step4 dn1 number (Box {func = f, obj = x }) = Box { func = f , obj = f dn1 number x } }}} And indeed, the allocation of all those `\dn2` closures is what is causing the problem. So we are missing this optimisation: {{{ (case dn2 of (f,r) -> f, case dn2 of (f,r) -> r) ===> dn2 }}} If we did this, then the lambda would look like `\dn2 -> f dn2` which could eta-reduce to `f`. But there are at least three problems: * The tuple transformation above is hard to spot * The tuple transformation is not quite semantically right; if `dn2` was bottom, the LHS and RHS are different * The eta-reduction isn't quite semantically right: if `f` ws bottom, the LHS and RHS are different. You might argue that the latter two can be ignored because dictionary arguments are special; indeed we often toy with making them strict. But perhaps a better way to avoid the tuple-transformation issue would be not to construct that strange expression in the first place. Where is it coming from? It comes from the call to `f` (admittedly applied to no arguments) in `Box { ..., func = f }`. GHC needs a dictionary for `(Numerical dum)` (I changed the name of the type variable in `func`'s type in the definition of `Box`). Since it's just a pair GHC says "fine, I'll build a pair, out of `Fractional dum` and `Real dum`. How does it get those dictionaries? By selecting the components of the `Franctional dum` passed to `f`. If GHC said instead "I need `Numerical dum` and behold I have one in hand, it'd be much better. It doesn't because tuple constraints are treated specially. But if we adopted the idea in #10362, we would (automatically) get to re-use the `Numerical dum` constraint. That would leave us with eta reduction, which is easier. As to what will get you rolling, a good solution is `test3`, which saves instantiating and re-generalising `f`. The key thing is to update all the fields ''except'' the polymorphic `func` field. I'm surprised you say that it doesn't work. Can you give a (presumably more complicated) example to demonstrate? Maybe there's a separate bug! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10359#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10359: Tuple constraint synonym led to asymptotic performance lossage -------------------------------------+------------------------------------- Reporter: axch | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by axch): In my actual use case, `test3` doesn't compile. The difference is that `Box` is also existentially quantified over the type of `obj`, because I want a heterogeneous collection of `Box`es. Translated to this context, the compilation error is {{{ Bug2.hs:51:29: Record update for insufficiently polymorphic field: obj :: a In the expression: b {obj = func number obj} In an equation for `do_step3': do_step3 number b@(Box {..}) = b {obj = func number obj} }}} Offending source file enclosed. I reproduce `test` and `test4`, just to confirm that they still execute but with bad asymptotics; I elide `test2`, which is what I ended up doing to get rolling. Is this compilation error also a bug? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10359#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10359: Tuple constraint synonym led to asymptotic performance lossage -------------------------------------+------------------------------------- Reporter: axch | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Ah I see. Well that rules out using record update. What happens if you simulate tuple predicates, thus: {{{ class (Fractional a, Real a) => Numerical a instance (Fractional a, Real a) => Numerical a }}} I think that'll work. You need `FlexibleInstances`, `UndecidableInstances`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10359#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10359: Tuple constraint synonym led to asymptotic performance lossage -------------------------------------+------------------------------------- Reporter: axch | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by axch): Haven't tried. Removing the abstraction in the one problematic place was a sufficient workaround. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10359#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10359: Tuple constraint synonym led to asymptotic performance lossage
-------------------------------------+-------------------------------------
Reporter: axch | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.6.3
Resolution: | Keywords:
Operating System: Linux | Architecture: x86_64
Type of failure: Runtime | (amd64)
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones

#10359: Tuple constraint synonym led to asymptotic performance lossage
-------------------------------------+-------------------------------------
Reporter: axch | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.6.3
Resolution: | Keywords:
Operating System: Linux | Architecture: x86_64
Type of failure: Runtime | (amd64)
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones

#10359: Tuple constraint synonym led to asymptotic performance lossage -------------------------------------+------------------------------------- Reporter: axch | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: fixed | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | perf/should_run/T10359 Related Tickets: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by simonpj): * status: new => closed * testcase: => perf/should_run/T10359 * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10359#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10359: Tuple constraint synonym led to asymptotic performance lossage
-------------------------------------+-------------------------------------
Reporter: axch | Owner:
Type: bug | Status: closed
Priority: normal | Milestone:
Component: Compiler | Version: 7.6.3
Resolution: fixed | Keywords:
Operating System: Linux | Architecture: x86_64
Type of failure: Runtime | (amd64)
performance bug | Test Case:
Blocked By: | perf/should_run/T10359
Related Tickets: | Blocking:
| Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by Austin Seipp

#10359: Tuple constraint synonym led to asymptotic performance lossage
-------------------------------------+-------------------------------------
Reporter: axch | Owner:
Type: bug | Status: closed
Priority: normal | Milestone:
Component: Compiler | Version: 7.6.3
Resolution: fixed | Keywords:
Operating System: Linux | Architecture: x86_64
Type of failure: Runtime | (amd64)
performance bug | Test Case:
Blocked By: | perf/should_run/T10359
Related Tickets: | Blocking:
| Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones

#10359: Tuple constraint synonym led to asymptotic performance lossage
-------------------------------------+-------------------------------------
Reporter: axch | Owner:
Type: bug | Status: closed
Priority: normal | Milestone:
Component: Compiler | Version: 7.6.3
Resolution: fixed | Keywords:
Operating System: Linux | Architecture: x86_64
Type of failure: Runtime | (amd64)
performance bug | Test Case:
Blocked By: | perf/should_run/T10359
Related Tickets: | Blocking:
| Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones
participants (1)
-
GHC