[GHC] #10137: Rewrite switch code generation

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Operating System: Unknown/Multiple Keywords: | Type of failure: None/Unknown Architecture: | Blocked By: Unknown/Multiple | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Inspired by #10124 I looked at the code generation for enumeration and integer types, and I think this can be improved in a few ways. My goals are: * Fancier code for integer types. Currently, the code for enumeration types will emit jump tables for dense choices; there is no reason to treat integer types differently. * The ability to behave differently if some of the cases are equal, and, as an extension of that, * The possibility to create branchless code if multiple checks would go to the same jump. The current scheme is roughly: * When we generate Cmm code for a STG case expression, we handle enumeration types and literals separately. * At this point, the decisions about what code to generate are made (jump tables (but only for enumeration types) or if-then-else trees). * The Common Block Optimization on Cmm happens later in the pipeline, making it non-trivial to detect branches that do the same thing. My plan is the following: * In the STG→Cmm transformation, floats will be handled separately. Matching against literals literals is fishy anyways, so my suggestion is to simply generate a linear list of equality checks here – turning the intended operation (equality test) into something else (comparisons in a if-then-else tree) feels wrong to me for floats. And the rest would not work well for floats, so I’d like to have them out of the way. * The case of enumeration types will be reduced to word literals, and treated the same from now on. * For integer types, no code generation decisions is made at this point. Instead, always a `CmmSwitch` statement is generated. * In a separate Cmm transformation pass, which will run /after/ the common block optimization, we visit all `CmmSwitches` and create proper code for them. I’d like to separate the algorithm that plans the code generation into a function (possibly even module) of its own, so that the decisions can easily by analyized and modified. The algorithm has a few choices to make: * If multiple values point to the same code, we can generate branchless code (`y := x == 1 || x == 5 || x = 7; if (y==0) then goto l1 else goto l2`). * If there are whole ranges pointing to the same code, the above can also use comparisons. * If there are dense ranges (i.e. a range with more than half of the possible values mapped to something), we want to generate jump tables from them (still `CmmSwitch` values). * Unless all options are handled by one of these possibilities, they need to be combined using `if-then-else` trees. The `CmmSwitch` constructor needs to change for that. It currently takes a `[Maybe Label]` argument, which is not suitable for before that pass, when its table may be sparse. I think an `IntMap Label` would work nicely. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Sounds good. I do think that some work can be done in Core. For example: {{{ case x of 1# -> e1 2# -> e2 7# -> e1 _ -> e2 }}} might turn into {{{ case (x ==# 1# `orI#` x ==# 7#) of 0# -> e2 -- False 1# -> e1 -- True }}} I would also '''dearly''' like to implement #8317 to get rid of all those stupid `tagToEnum#` calls that clutter up the code. We don't do this for a stupid reason: #8326. As part of your crusade, dealing with this would be fantastic. I'd be happy to advise/help. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317 | -------------------------------------+------------------------------------- Changes (by jstolarek): * related: => #9157, #8326, #8317 Comment:
The Common Block Optimization on Cmm happens later in the pipeline, making it non-trivial to detect branches that do the same thing. #9157 demonstrates a case where CBE fails, although it definitely shouldn't.
Re #8326 I spent several weeks trying to fix it but ultimately I failed. You can find various experimental versions of my code on these branches: [[https://github.com/jstolarek/ghc/tree/T8326-heap-checks|T8326-heap- checks]] [[https://github.com/jstolarek/ghc/tree/T8326-heap-checks-aletr- gc_plan|T8326-heap-checks-aletr-gc_plan]] [[https://github.com/jstolarek/ghc/tree/T8326-heap-checks-alternative- plan|T8326-heap-checks-alternative-plan]] [[https://github.com/jstolarek/ghc/tree/T8326-heap-checks-precompile- alts|T8326-heap-checks-precompile-alts]] though it probably will be easier to write your own code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317 | -------------------------------------+------------------------------------- Comment (by nomeata): All very well, but can we please keep this particular ticket for the discussion of better code generation from STG onwards :-) #10124 is more suitable for what we can do in Core, I think. My rewrite will not only fix the issue of branchless cases, but also other (small) deficiencies in the current code (jump tables also for integer types; a better choice for if-then-else branching that leads to larger jump tables). Anyway, I started on wip/T10137, but already my refactoring broke something and I get segfaults :-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

The case of enumeration types will be reduced to word literals, and
#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317, #9159 | -------------------------------------+------------------------------------- Changes (by rwbarton): * related: #9157, #8326, #8317 => #9157, #8326, #8317, #9159 Comment: Also see the comments of #9159. treated the same from now on. Note that when matching on an enumeration type, we can assume that the constructor tag is within the range of possible tag values. We cannot make any such assumption for matches on ints. So, we should remember the extra information that we have about the range when doing this reduction. I imagine you will have a function to generate code for an int pattern match with known bounds anyways, as part of a binary search algorithm, so this should not complicate the code greatly.
Matching against literals literals is fishy anyways, so my suggestion is to simply generate a linear list of equality checks here – turning the intended operation (equality test) into something else (comparisons in a if-then-else tree) feels wrong to me for floats.
I don't see anything wrong with using comparisons for pattern matching on floats as long as the patterns are all finite numbers (though there is trickiness around negative zero to be aware of). But I also think float pattern matching is much less important than your other goals, so I would agree with doing something simple for floats, at least initially. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317, #9159 | -------------------------------------+------------------------------------- Comment (by nomeata):
Note that when matching on an enumeration type, we can assume that the constructor tag is within the range of possible tag values. We cannot make any such assumption for matches on ints. So, we should remember the extra information that we have about the range when doing this reduction.
Right. The datatype is currently {{{ data SwitchTargets = SwitchTargets (Maybe (Integer, Integer)) (Maybe Label) (M.Map Integer Label) }}} so there optionally is a definite range (absent when matching literal). The function `addRange` in the new `CmmSwitch` module (currently in branch `wip/T10137`) takes care of that case, before the general layout algorithm runs on the remaining range. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317, #9159 | -------------------------------------+------------------------------------- Changes (by nomeata): * owner: => nomeata Comment: I’m happy to have some code review on Phab:D720, but it is still work-in- progress. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317, #9159 | -------------------------------------+------------------------------------- Description changed by nomeata: Old description:
Inspired by #10124 I looked at the code generation for enumeration and integer types, and I think this can be improved in a few ways. My goals are:
* Fancier code for integer types. Currently, the code for enumeration types will emit jump tables for dense choices; there is no reason to treat integer types differently. * The ability to behave differently if some of the cases are equal, and, as an extension of that, * The possibility to create branchless code if multiple checks would go to the same jump.
The current scheme is roughly:
* When we generate Cmm code for a STG case expression, we handle enumeration types and literals separately. * At this point, the decisions about what code to generate are made (jump tables (but only for enumeration types) or if-then-else trees). * The Common Block Optimization on Cmm happens later in the pipeline, making it non-trivial to detect branches that do the same thing.
My plan is the following:
* In the STG→Cmm transformation, floats will be handled separately. Matching against literals literals is fishy anyways, so my suggestion is to simply generate a linear list of equality checks here – turning the intended operation (equality test) into something else (comparisons in a if-then-else tree) feels wrong to me for floats. And the rest would not work well for floats, so I’d like to have them out of the way. * The case of enumeration types will be reduced to word literals, and treated the same from now on. * For integer types, no code generation decisions is made at this point. Instead, always a `CmmSwitch` statement is generated. * In a separate Cmm transformation pass, which will run /after/ the common block optimization, we visit all `CmmSwitches` and create proper code for them.
I’d like to separate the algorithm that plans the code generation into a function (possibly even module) of its own, so that the decisions can easily by analyized and modified.
The algorithm has a few choices to make:
* If multiple values point to the same code, we can generate branchless code (`y := x == 1 || x == 5 || x = 7; if (y==0) then goto l1 else goto l2`). * If there are whole ranges pointing to the same code, the above can also use comparisons. * If there are dense ranges (i.e. a range with more than half of the possible values mapped to something), we want to generate jump tables from them (still `CmmSwitch` values). * Unless all options are handled by one of these possibilities, they need to be combined using `if-then-else` trees.
The `CmmSwitch` constructor needs to change for that. It currently takes a `[Maybe Label]` argument, which is not suitable for before that pass, when its table may be sparse. I think an `IntMap Label` would work nicely.
New description: Inspired by #10124 I looked at the code generation for enumeration and integer types, and I think this can be improved in a few ways. My goals are: * Fancier code for integer types. Currently, the code for enumeration types will emit jump tables for dense choices; there is no reason to treat integer types differently. * The ability to behave differently if some of the case alternatives are equal, and, as an extension of that, * The possibility to create branchless code if multiple checks would go to the same jump. The current scheme is roughly: * When we generate Cmm code for a STG case expression, we handle enumeration types and literals separately. * At this point, the decisions about what code to generate are made (jump tables (but only for enumeration types) or if-then-else trees). * The Common Block Optimization on Cmm happens later in the pipeline, making it non-trivial to detect branches that do the same thing. My plan is the following: * In the STG→Cmm transformation, floats will be handled separately. Matching against literals is fishy anyways, so my suggestion is to simply generate a linear list of equality checks here – turning the intended operation (equality test) into something else (comparisons in a if-then- else tree) feels wrong to me for floats. And the rest would not work well for floats, so I’d like to have them out of the way. * The case of enumeration types will be reduced to word literals, and treated the same from now on. * For integer types, no code generation decisions is made at this point. Instead, always a `CmmSwitch` statement is generated. * In a separate Cmm transformation pass, which will run /after/ the common block optimization, we visit all `CmmSwitches` and create proper code for them. I’d like to separate the algorithm that plans the code generation into a function (possibly even module) of its own, so that the decisions can easily by analyized and modified. The algorithm has a few choices to make: * If multiple values point to the same code, we can generate branchless code (`y := x == 1 || x == 5 || x = 7; if (y==0) then goto l1 else goto l2`). * If there are whole ranges pointing to the same code, the above can also use comparisons. * If there are dense ranges (i.e. a range with more than half of the possible values mapped to something), we want to generate jump tables from them (still `CmmSwitch` values). * Unless all options are handled by one of these possibilities, they need to be combined using `if-then-else` trees. The `CmmSwitch` constructor needs to change for that. It currently takes a `[Maybe Label]` argument, which is not suitable for before that pass, when its table may be sparse. I think an `IntMap Label` would work nicely. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317, #9159 | -------------------------------------+------------------------------------- Comment (by nomeata): Fist benchmarks results are in; binary sizes are reduced, runtime changes varies with a small tendency towards improvement: {{{ Size Allocs Runtime Elapsed TotalMem Min -1.8% -0.0% -4.5% -7.8% 0.0% Max -0.7% 0.0% +3.3% +3.4% +2.9% Geometric Mean -1.4% -0.0% -0.2% -0.6% +0.1% }}} The code still has some magic constants where an optimum could probably be found somehow. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: task | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317, #9159 | -------------------------------------+------------------------------------- Changes (by nomeata): * status: new => patch Comment: Ok, I think the code is ready for some review; see Phab:D720. I’d like to land this before adding some of the extra bells and whistles, such as branches code in situations where this is possible. Note that although #10124 and the story about branchless codes made me look at this, the rewrite of code generation for switches stands on its own, as it improves upon the previous situations. In particular, we currently never generate jump tables for case expressions on `Int#` and `Word#`, which is quite a waste (previously reported as #9159 as well), and we were building the if-then-else tree before creating jump tables, thus possibly splitting them in the middle. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation
-------------------------------------+-------------------------------------
Reporter: nomeata | Owner: nomeata
Type: task | Status: patch
Priority: normal | Milestone:
Component: Compiler | Version: 7.9
(CodeGen) | Keywords:
Resolution: | Architecture:
Operating System: Unknown/Multiple | Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: #9157, #8326, | Differential Revisions:
#8317, #9159 |
-------------------------------------+-------------------------------------
Comment (by Joachim Breitner

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: task | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: fixed | Architecture: Operating System: Unknown/Multiple | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Revisions: #8317, #9159 | -------------------------------------+------------------------------------- Changes (by nomeata): * status: patch => closed * resolution: => fixed Comment: Pushed this now, ready to tackle any fallout. Now on to implementing a fix for #10124. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: task | Status: closed Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.9 (CodeGen) | Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #8326, | Differential Rev(s): #8317, #9159 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): I don’t plan to look into this now, but whoever touches this code next and feels like improving the heuristics that decide whether to do a table or a binary search, see [ticket:9159#comment:5] for what the Java compiler does. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10137: Rewrite switch code generation -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: task | Status: closed Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.9 (CodeGen) | Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9157, #9159, | Differential Rev(s): #8326, #8317, #9159 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * related: #9157, #8326, #8317, #9159 => #9157, #9159, #8326, #8317, #9159 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10137#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC