[GHC] #14610: newtype wrapping of a monadic stack kills performance

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- I have a project where I in one module (A) I decided to build something like a minimal framework for the other bigger module (B) to use. One of the main points of the framework is that the monadic stacks are hidden behind `newtype`s and in those monads you can only use the functions that the module A provides. The module A can be found here: https://github.com/mrkkrp/mmark/blob/master/Text/MMark/Parser/Internal.hs There are two monadic stack wrapped with newtypes: `BParser` and `IParser`. The module B is this: https://github.com/mrkkrp/mmark/blob/master/Text/MMark/Parser.hs But it's not really relevant. Now if I change `newtype`s to `type` synonyms like so: {{{ type BParser a = ParsecT MMarkErr Text (State BlockState) a type IParser a = StateT InlineState (Parsec MMarkErr Text) a }}} and do corresponding minor corrections, I get 2x less allocations and almost 2x faster code (before): {{{ Case Allocated GCs Max with file: data/bench-yaml-block.md 119,080 0 11,088 with file: data/bench-thematic-break.md 74,368 0 10,224 with file: data/bench-heading.md 901,928 0 9,432 with file: data/bench-fenced-code-block.md 145,744 0 9,368 with file: data/bench-indented-code-block.md 124,312 0 9,368 with file: data/bench-unordered-list.md 2,010,496 1 10,784 with file: data/bench-ordered-list.md 2,025,016 1 10,728 with file: data/bench-blockquote.md 1,961,672 1 42,648 with file: data/bench-paragraph.md 2,084,104 1 42,648 with file: data/comprehensive.md 25,899,496 24 44,200 benchmarking with file: data/comprehensive.md time 3.590 ms (3.578 ms .. 3.601 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 3.555 ms (3.546 ms .. 3.565 ms) std dev 31.07 μs (24.63 μs .. 39.90 μs) }}} After: {{{ Case Allocated GCs Max with file: data/bench-yaml-block.md 116,864 0 11,088 with file: data/bench-thematic-break.md 64,776 0 10,392 with file: data/bench-heading.md 615,672 0 9,432 with file: data/bench-fenced-code-block.md 144,736 0 9,368 with file: data/bench-indented-code-block.md 123,352 0 9,368 with file: data/bench-unordered-list.md 795,072 0 41,568 with file: data/bench-ordered-list.md 808,808 0 41,512 with file: data/bench-blockquote.md 826,392 0 41,568 with file: data/bench-paragraph.md 881,432 0 41,568 with file: data/comprehensive.md 10,945,104 10 44,440 benchmarking with file: data/comprehensive.md time 2.451 ms (2.448 ms .. 2.456 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.432 ms (2.427 ms .. 2.437 ms) std dev 15.86 μs (13.16 μs .. 19.01 μs) }}} I'm not great at reading non-trivial core, but I gave it a shot and dumped some core. One thing I noticed that core for the module A is the same in both cases. Then the problem is an inter-module problem, probably like when you don't dump definitions into interface files with `INLINEABLE` and end up without specialization, but here it perhaps has to do with the fact that B has no information about internals of the monadic stacks from A, but I'm not sure. Here is the beginnings of core dumps (before): {{{ ==================== Tidy Core ==================== 2017-12-23 04:55:17.967944505 UTC Result size of Tidy Core = {terms: 210,169, types: 209,675, coercions: 82,492, joins: 973/3,753} }}} After: {{{ ==================== Tidy Core ==================== 2017-12-23 04:58:46.386265108 UTC Result size of Tidy Core = {terms: 301,256, types: 263,104, coercions: 28,560, joins: 1,726/5,393} }}} So it looks like `newtype` wrapping reduces GHC's ability to create join points SPJ talked about recently. I do understand that you expect a minimal reproducing example but I could not produce it even though I spent several hours in futile attempts. I started by creating a small project, defining a similar stack with a `newtype` wrapper and using it in a simple parser in another module. Then I benchmarked the parser. There is no difference between `newtype`d and code and the code that uses just type synonyms. I tried different variations, no difference. The core output is too big for me to analyze, it's like 25 Mb and 33 Mb, and I have no idea what's going on there. You're welcome to experiment with the repo, there are benchmarks for memory usage and criterion benchmarks for speed of execution: https://github.com/mrkkrp/mmark If you like I can create a separate branch with newtypes replaced with type synonyms, so you can compare the both approaches easier. I'm submitting this because my friend convinced me that it's better to let you know (even though I could not create a minimal reproducing example on my own) than to let it go completely unnoticed. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by mrkkrp: Old description:
I have a project where I in one module (A) I decided to build something like a minimal framework for the other bigger module (B) to use. One of the main points of the framework is that the monadic stacks are hidden behind `newtype`s and in those monads you can only use the functions that the module A provides.
The module A can be found here:
https://github.com/mrkkrp/mmark/blob/master/Text/MMark/Parser/Internal.hs
There are two monadic stack wrapped with newtypes: `BParser` and `IParser`.
The module B is this:
https://github.com/mrkkrp/mmark/blob/master/Text/MMark/Parser.hs
But it's not really relevant.
Now if I change `newtype`s to `type` synonyms like so:
{{{ type BParser a = ParsecT MMarkErr Text (State BlockState) a type IParser a = StateT InlineState (Parsec MMarkErr Text) a }}}
and do corresponding minor corrections, I get 2x less allocations and almost 2x faster code (before):
{{{ Case Allocated GCs Max with file: data/bench-yaml-block.md 119,080 0 11,088 with file: data/bench-thematic-break.md 74,368 0 10,224 with file: data/bench-heading.md 901,928 0 9,432 with file: data/bench-fenced-code-block.md 145,744 0 9,368 with file: data/bench-indented-code-block.md 124,312 0 9,368 with file: data/bench-unordered-list.md 2,010,496 1 10,784 with file: data/bench-ordered-list.md 2,025,016 1 10,728 with file: data/bench-blockquote.md 1,961,672 1 42,648 with file: data/bench-paragraph.md 2,084,104 1 42,648 with file: data/comprehensive.md 25,899,496 24 44,200
benchmarking with file: data/comprehensive.md time 3.590 ms (3.578 ms .. 3.601 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 3.555 ms (3.546 ms .. 3.565 ms) std dev 31.07 μs (24.63 μs .. 39.90 μs) }}}
After:
{{{ Case Allocated GCs Max with file: data/bench-yaml-block.md 116,864 0 11,088 with file: data/bench-thematic-break.md 64,776 0 10,392 with file: data/bench-heading.md 615,672 0 9,432 with file: data/bench-fenced-code-block.md 144,736 0 9,368 with file: data/bench-indented-code-block.md 123,352 0 9,368 with file: data/bench-unordered-list.md 795,072 0 41,568 with file: data/bench-ordered-list.md 808,808 0 41,512 with file: data/bench-blockquote.md 826,392 0 41,568 with file: data/bench-paragraph.md 881,432 0 41,568 with file: data/comprehensive.md 10,945,104 10 44,440
benchmarking with file: data/comprehensive.md time 2.451 ms (2.448 ms .. 2.456 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.432 ms (2.427 ms .. 2.437 ms) std dev 15.86 μs (13.16 μs .. 19.01 μs) }}}
I'm not great at reading non-trivial core, but I gave it a shot and dumped some core. One thing I noticed that core for the module A is the same in both cases. Then the problem is an inter-module problem, probably like when you don't dump definitions into interface files with `INLINEABLE` and end up without specialization, but here it perhaps has to do with the fact that B has no information about internals of the monadic stacks from A, but I'm not sure.
Here is the beginnings of core dumps (before):
{{{ ==================== Tidy Core ==================== 2017-12-23 04:55:17.967944505 UTC
Result size of Tidy Core = {terms: 210,169, types: 209,675, coercions: 82,492, joins: 973/3,753} }}}
After:
{{{ ==================== Tidy Core ==================== 2017-12-23 04:58:46.386265108 UTC
Result size of Tidy Core = {terms: 301,256, types: 263,104, coercions: 28,560, joins: 1,726/5,393} }}}
So it looks like `newtype` wrapping reduces GHC's ability to create join points SPJ talked about recently.
I do understand that you expect a minimal reproducing example but I could not produce it even though I spent several hours in futile attempts. I started by creating a small project, defining a similar stack with a `newtype` wrapper and using it in a simple parser in another module. Then I benchmarked the parser. There is no difference between `newtype`d and code and the code that uses just type synonyms. I tried different variations, no difference.
The core output is too big for me to analyze, it's like 25 Mb and 33 Mb, and I have no idea what's going on there.
You're welcome to experiment with the repo, there are benchmarks for memory usage and criterion benchmarks for speed of execution:
https://github.com/mrkkrp/mmark
If you like I can create a separate branch with newtypes replaced with type synonyms, so you can compare the both approaches easier.
I'm submitting this because my friend convinced me that it's better to let you know (even though I could not create a minimal reproducing example on my own) than to let it go completely unnoticed.
New description: I have a project where I in one module (A) I decided to build something like a minimal framework for the other bigger module (B) to use. One of the main points of the framework is that the monadic stacks are hidden behind `newtype`s and in those monads you can only use the functions that the module A provides. The module A can be found here: https://github.com/mrkkrp/mmark/blob/ghc-bug- newtypes/Text/MMark/Parser/Internal.hs There are two monadic stack wrapped with newtypes: `BParser` and `IParser`. The module B is this: https://github.com/mrkkrp/mmark/blob/ghc-bug-newtypes/Text/MMark/Parser.hs But it's not really relevant. Now if I change `newtype`s to `type` synonyms like so: {{{ type BParser a = ParsecT MMarkErr Text (State BlockState) a type IParser a = StateT InlineState (Parsec MMarkErr Text) a }}} and do corresponding minor corrections, I get 2x less allocations and almost 2x faster code (before): {{{ Case Allocated GCs Max with file: data/bench-yaml-block.md 119,080 0 11,088 with file: data/bench-thematic-break.md 74,368 0 10,224 with file: data/bench-heading.md 901,928 0 9,432 with file: data/bench-fenced-code-block.md 145,744 0 9,368 with file: data/bench-indented-code-block.md 124,312 0 9,368 with file: data/bench-unordered-list.md 2,010,496 1 10,784 with file: data/bench-ordered-list.md 2,025,016 1 10,728 with file: data/bench-blockquote.md 1,961,672 1 42,648 with file: data/bench-paragraph.md 2,084,104 1 42,648 with file: data/comprehensive.md 25,899,496 24 44,200 benchmarking with file: data/comprehensive.md time 3.590 ms (3.578 ms .. 3.601 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 3.555 ms (3.546 ms .. 3.565 ms) std dev 31.07 μs (24.63 μs .. 39.90 μs) }}} After: {{{ Case Allocated GCs Max with file: data/bench-yaml-block.md 116,864 0 11,088 with file: data/bench-thematic-break.md 64,776 0 10,392 with file: data/bench-heading.md 615,672 0 9,432 with file: data/bench-fenced-code-block.md 144,736 0 9,368 with file: data/bench-indented-code-block.md 123,352 0 9,368 with file: data/bench-unordered-list.md 795,072 0 41,568 with file: data/bench-ordered-list.md 808,808 0 41,512 with file: data/bench-blockquote.md 826,392 0 41,568 with file: data/bench-paragraph.md 881,432 0 41,568 with file: data/comprehensive.md 10,945,104 10 44,440 benchmarking with file: data/comprehensive.md time 2.451 ms (2.448 ms .. 2.456 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.432 ms (2.427 ms .. 2.437 ms) std dev 15.86 μs (13.16 μs .. 19.01 μs) }}} I'm not great at reading non-trivial core, but I gave it a shot and dumped some core. One thing I noticed that core for the module A is the same in both cases. Then the problem is an inter-module problem, probably like when you don't dump definitions into interface files with `INLINEABLE` and end up without specialization, but here it perhaps has to do with the fact that B has no information about internals of the monadic stacks from A, but I'm not sure. Here is the beginnings of core dumps (before): {{{ ==================== Tidy Core ==================== 2017-12-23 04:55:17.967944505 UTC Result size of Tidy Core = {terms: 210,169, types: 209,675, coercions: 82,492, joins: 973/3,753} }}} After: {{{ ==================== Tidy Core ==================== 2017-12-23 04:58:46.386265108 UTC Result size of Tidy Core = {terms: 301,256, types: 263,104, coercions: 28,560, joins: 1,726/5,393} }}} So it looks like `newtype` wrapping reduces GHC's ability to create join points SPJ talked about recently. I do understand that you expect a minimal reproducing example but I could not produce it even though I spent several hours in futile attempts. I started by creating a small project, defining a similar stack with a `newtype` wrapper and using it in a simple parser in another module. Then I benchmarked the parser. There is no difference between `newtype`d and code and the code that uses just type synonyms. I tried different variations, no difference. The core output is too big for me to analyze, it's like 25 Mb and 33 Mb, and I have no idea what's going on there. You're welcome to experiment with the repo, there are benchmarks for memory usage and criterion benchmarks for speed of execution: https://github.com/mrkkrp/mmark I have created two branches I won't touch, one with newtypes and another one with type synonyms: * https://github.com/mrkkrp/mmark/tree/ghc-bug-newtypes * https://github.com/mrkkrp/mmark/tree/ghc-bug-type-synonyms Just checkout one of these and run `stack bench`. This is the commit that changes newtypes to type synonyms: https://github.com/mrkkrp/mmark/commit/759d8d4aa52dd57a393299c63e8c9b70d0d43... I'm submitting this because my friend convinced me that it's better to let you know (even though I could not create a minimal reproducing example on my own) than to let it go completely unnoticed. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

So it looks like newtype wrapping reduces GHC's ability to create join
#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): points SPJ talked about recently. Without looking at the code: This is a good guess. `foo x = if … then foo (x+1) else …` is a tail call (and hence allows `foo` to be a join point) while `foo x = if … then foo (x+1) `cast` coercion … else …` is not. But (from an operational point of view) it could well be! But someone would have to produce a minimal example to be sure this is really the issue. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance
-------------------------------------+-------------------------------------
Reporter: mrkkrp | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by dfeuer):
I haven't yet been able to reproduce quite the same behavior, but this
small example looks like it could be related. If I write
{{{#!hs
foo :: forall a. (Int -> Bool) -> Int -> a -> a
foo p = go
where
go :: Int -> a -> a
go !n a
| p n = a
| otherwise = go (n + 1) a
}}}
then I get
{{{
foo
= \ (@ a_aYZ)
(p_aWO :: Int -> Bool)
(eta_B2 :: Int)
(eta1_B1 :: a_aYZ) ->
case eta_B2 of { GHC.Types.I# ww1_s1bZ ->
joinrec {
$wgo_s1c1 [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]
:: GHC.Prim.Int# -> a_aYZ -> a_aYZ
[LclId[JoinId(2)], Arity=2, Str=, Unf=OtherCon []]
$wgo_s1c1 (ww2_X1cu :: GHC.Prim.Int#) (w_s1bW :: a_aYZ)
= case p_aWO (GHC.Types.I# ww2_X1cu) of {
False -> jump $wgo_s1c1 (GHC.Prim.+# ww2_X1cu 1#) w_s1bW;
True -> w_s1bW
}; } in
jump $wgo_s1c1 ww1_s1bZ eta1_B1
}
}}}
But if I make `go` polymorphic,
{{{#!hs
foo :: (Int -> Bool) -> Int -> a -> a
foo p = go
where
go :: Int -> b -> b
go !n a
| p n = a
| otherwise = go (n + 1) a
}}}
I get a wrapper and this worker:
{{{#!hs
T14610.$wfoo
= \ (@ a_s1cm)
(w_s1cn :: Int -> Bool)
(ww_s1cs :: GHC.Prim.Int#)
(w1_s1cp :: a_s1cm) ->
letrec {
$wgo_s1cl [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]
:: forall b. GHC.Prim.Int# -> b -> b
[LclId, Arity=2, Str=, Unf=OtherCon []]
$wgo_s1cl
= \ (@ b_s1ce) (ww1_s1cj :: GHC.Prim.Int#) (w2_s1cg :: b_s1ce)
->
case w_s1cn (GHC.Types.I# ww1_s1cj) of {
False -> $wgo_s1cl @ b_s1ce (GHC.Prim.+# ww1_s1cj 1#)
w2_s1cg;
True -> w2_s1cg
}; } in
$wgo_s1cl @ a_s1cm ww_s1cs w1_s1cp
}}}
This distinction remains as `let` vs. `let-no-escape` in STG.
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:3
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: dfeuer (added) * failure: None/Unknown => Runtime performance bug * milestone: => 8.6.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Here's an example that doesn't quite go far enough to demonstrate the problem, but seems closer in spirit to the original. {{{#!hs newtype D a = D {getD :: a} d :: a -> D a d a = D a baz :: D Int -> Int -> D Int baz y x0 = foo x0 where foo :: Int -> D Int foo 0 = y foo x = D (bar (x - 3)) bar :: Int -> Int bar 0 = getD y bar x = getD (foo x) }}} Compiling with `-dverbose-core2core`, we see that after the first simplifier run (gentle, before floating), we get {{{ baz :: D Int -> Int -> D Int [LclIdX, Arity=2, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [0 0] 121 0}] baz = \ (y_aSE :: D Int) (x0_aSF :: Int) -> letrec { foo_aSG [Occ=LoopBreaker] :: Int -> D Int [LclId, Arity=1, Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [20] 81 0}] foo_aSG = \ (ds_d27R :: Int) -> case ds_d27R of { GHC.Types.I# ds_d27T -> case ds_d27T of ds_X289 { __DEFAULT -> (case ds_X289 of ds_X284 { __DEFAULT -> (foo_aSG (GHC.Types.I# (GHC.Prim.-# ds_X284 3#))) `cast` (Foo.N:D[0] <Int>_R :: (D Int :: *) ~R# (Int :: *)); 3# -> y_aSE `cast` (Foo.N:D[0] <Int>_R :: (D Int :: *) ~R# (Int :: *)) }) `cast` (Sym (Foo.N:D[0] <Int>_R) :: (Int :: *) ~R# (D Int :: *)); 0# -> y_aSE } }; } in foo_aSG x0_aSF }}} Note that `foo_aSG` is bound by `letrec`. If we switch to a type synonym version, {{{#!hs type D a = a getD :: D a -> a getD a = a d :: a -> D a d a = a baz :: Int -> Int -> Int baz y x0 = foo x0 where foo :: Int -> Int foo 0 = y foo x = d (bar (x - 3)) bar :: Int -> Int bar 0 = getD y bar x = getD (foo x) }}} then at the same point in core2core we instead see {{{#!hs baz :: Int -> Int -> Int [LclIdX, Arity=2, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [0 0] 59 0}] baz = \ (y_aSQ :: Int) (x0_aSR :: Int) -> joinrec { foo_aSS [Occ=LoopBreaker] :: Int -> Int [LclId[JoinId(1)], Arity=1, Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [20] 55 0}] foo_aSS (ds_d27X :: Int) = case ds_d27X of { GHC.Types.I# ds_d27Z -> case ds_d27Z of ds_X28a { __DEFAULT -> jump foo_aSS (GHC.Types.I# (GHC.Prim.-# ds_X28a 3#)); 0# -> y_aSQ; 3# -> y_aSQ } }; } in jump foo_aSS x0_aSR }}} The reason this example doesn't quite go far enough is that later transformations work out the kinks and recognize the join point. But based on the bug report, that isn't always the case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * cc: nomeata (added) Comment: The polymorphic `go` in comment:3 is a separate problem from the one that I described further up (but it might well be the OP’s problem). It could be fixed by static argument transformation. On my TODO list, after #14068, is SAT for join points, but it would not help here, because only after SAT it is a join point. Exitification unfortunately also does not help to turn this into a joint point. Maybe worth opening a separate ticket. Your example in comment:4 is precisely what I described, thanks for working it out. The fix here probably requires some significant thought about the join point typing. Maybe it is enough to relax the rule “The return type must not depend on any arguments” to “The return type must be representatoinally equal for all arguments” and that might allow us to {{{e `cast` co}}} to be a tail-call position. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * related: => #14620 Comment: nomeata, I opened #14620 for the issue in comment:3. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: 14620 | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * blockedby: => 14620 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * blockedby: 14620 => Comment: bgamari, I don't think this is actually blocking on #14620. Why do you think it is? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Ahh, indeed you are right. It's an orthogonal concern. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Maybe it is enough to relax the rule “The return type must not depend on any arguments” to “The return type must be representatoinally equal for all arguments” and that might allow us to e `cast` co to be a tail-call
#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): position Actually I think we may just be able to say that if `(f x)` is a tail call, then `(f x |> co)` is a tail call. So in this code: {{{ occAnal env (Cast expr co) = case occAnal env expr of { (usage, expr') -> let usage1 = zapDetailsIf (isRhsEnv env) usage -- usage1: if we see let x = y `cast` co -- then mark y as 'Many' so that we don't -- immediately inline y again. usage2 = addManyOccsSet usage1 (coVarsOfCo co) -- usage2: see Note [Gather occurrences of coercion variables] in (markAllNonTailCalled usage2, Cast expr' co) } }}} just remove the `markAllNonTailCalled`. That call was in Luke's original join-point patch, but it seems over-conservative to me. That said, I'm intrigued about how this happens in practice, if it really does. In the example given in comment:5, a single run of the simplifer removes the redundant casts, and for recursive join points the tail calls really must return the same type as the function itself, so the cast seems unlikely. While the change above (removing `markAllNonTailCalled`) is ok (I think), I'm surprised if it has any effect. An example would be great. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => JoinPoints -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance
-------------------------------------+-------------------------------------
Reporter: mrkkrp | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone: 8.6.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #14620 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by nomeata):
Here is a contrived example. This code:
{{{
{-# LANGUAGE KindSignatures, DataKinds, GADTs, TypeFamilies #-}
module RecCast (foo) where
import Data.Coerce
data Nat = Z | S Nat
data Sing (n :: Nat) where
FZ :: Sing Z
FS :: Sing n -> Sing (S n)
-- The NoConst to avoid worker-wrapper
data Const (y::Nat) = Const Nat | NoConst
mapConst :: (Nat -> Nat) -> Const n -> Const n
mapConst f (Const x) = Const (f x)
mapConst f NoConst = NoConst
{-# NOINLINE mapConst #-}
inc :: Const n -> Const (S n)
inc = coerce
type family Plus n m :: Nat where
Plus Z m = m
Plus (S n) m = S (Plus n m)
foo :: Sing n -> Const m -> Const (Plus n m)
foo FZ c = c
foo (FS n) c = inc (foo n (mapConst S c))
}}}
produces this function that would be a join point when we consider casted
expressions as tail-recursive:
{{{
Rec {
-- RHS size: {terms: 14, types: 25, coercions: 18, joins: 0/0}
foo [Occ=LoopBreaker]
:: forall (n :: Nat) (m :: Nat).
Sing n -> Const m -> Const (Plus n m)
[GblId, Arity=2, Caf=NoCafRefs, Str=]
foo
= \ (@ (n_asO :: Nat))
(@ (m_asP :: Nat))
(ds_dXH :: Sing n_asO)
(c_aqe :: Const m_asP) ->
case ds_dXH of {
FZ cobox_asR [Dmd=

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Good example. But it fails the `isValidJoinPointType` test (c.f. #14620), so even if we roped the `markAllNonTailCalled` in the `Cast` case of occurrence- analysis, we would not get `go` as a join point. And indeed that's not unreasonable. Consider {{{ case ( letrec go n m ds m2 = case ds of ) ( FX co -> m2 |> (...co..) ) ( FS ... -> (go ...) |> co2 ) ( in go t1 t2 a b ) of BLAH }}} Operationally we have a join point, but the transformation to move that case inwards would give this {{{ letrec go n m ds m2 = case ds of FX co -> case m2 |> (...co..) of BLAH FS ... -> (go ...) |> co2 in go t1 t2 a b }}} but now the outer case is scrutinising something involving n, m etc, which makes no sense. Operationally the transformation makes sense, but it's not well typed. I have no idea how to fix this. We still have no example of a program that has a cast in the return (and and hence might benefit from dropping the `makrAllNonTailCalled`) but which passes the `isValidJoinPointType` test. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): That example is one where the result type depends on the parameters… but don’t see why we need that restriction – this code shows that it would be beneficial and not crash to allow it. The note `[The polymorphism rule of join points]` basically says “We need this restriction because the CPS translation would not be easily typable, but that is not very convincing. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
Operationally the transformation makes sense, but it's not well typed. I have no idea how to fix this.
So am I right to say “we forbid polymorphic return types because we cannot prove it to be sound, even though we know it would not actually crash if we did allow it”? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
So am I right to say “we forbid polymorphic return types because we cannot prove it to be sound, even though we know it would not actually crash if we did allow it”?
Yes, that's right. But be specific about what you mean by "it" in "would not crash if we did allow it". In comment:14 I show a particular transformation. I believe it is (slightly) beneficial, and will not crash, but the result is not type-correct. If you can figure out how to express the proof that it won't crash, as a coercion perhaps, maybe you could somehow express that proof in Core. You don't need casts to expose the problem: #14620 is enough. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Here is an idea. Consider a recursive function `foo` with type {{{ foo :: forall t, a -> r t foo @t a = E[foo @t2 x `cast` co)] -- E[_] is tail call context, and the cast prevents join-point-hood }}} But we can transform this into a form that is allowed by now, just by introducing casts: {{{ foo :: forall t, a -> r t foo @t a = go @t a refl where go :: forall t', a -> (r t ~ r t') -> r t go @t' a = E[go @t'2 co] }}} Note that now the return type does no longer depend on the arguments of `go`, so it is a join point. This transformation smells like a worker-wrapper transformation, but I am not sure. Anyways, I am not arguing that we should do this transformation in GHC, but do it in the proofs so that we can simply happily remove the restriction in `isValidJoinPointType`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Ah, of course you are right: If we drop `isValidJoinPointType` then we get more join points, and still no crashes. But it would indeed prevent the transformation in comment:14. That leaves two options: * Un-join-point `go` if we want to do this transformation (no regression about now) * Actually do this transformation I propose in comment:18 in GHC, which means we get a join point as before and can do that transformation. All not carefully thought out, of course, and too much distraction here right now, so I am mostly conveying my gut feeling, hoping to not make a fool out of myself here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I utterly hate (1). But maybe something along the lines of (2) could work. Adding a coercion as an accumulating parameter precisely builds up the proof I referred to in comment:17. Though I have no idea how to do this in general! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Ok, thanks for the clarification. Finally I get why that restriction is there. I will try to improve the Note about it (and pass it by you for confirmation). I think the transformation is possible, but it is somewhat non-local, as you need to move all casts inwards towards the recursive call, and then into the newly added parameter. Ff we find that there are performance gains to be won, then it might be worth it! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance -------------------------------------+------------------------------------- Reporter: mrkkrp | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Simon, can you briefly check that https://phabricator.haskell.org/D4281 is an improvement to the Note? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14610#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14610: newtype wrapping of a monadic stack kills performance
-------------------------------------+-------------------------------------
Reporter: mrkkrp | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone: 8.6.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #14620 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Joachim Breitner
participants (1)
-
GHC