[GHC] #14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees"

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: feature | Status: new request | Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- = The basic Ideas Currently GHC uses Augustssons algorithm for pattern matching which was published in 85 which works fine for many cases. However there has been research on how to do better than that for a while now. Most notably by Maranget in "Compiling Pattern Matching to Good Decision Trees". I plan to look into applying his ideas to GHC for my undergraduate thesis. Thanks to imprecise exceptions we should be able to reorder argument evaluation as long as this doesn't make our programs more strict. Proofing that is high on my bucket list but hasn't been done yet. = Pros and Cons **Advantages** would be: * It should lead to equal or better code for the vast majority of patterns. * It might reduce compile times for cases were we can desugar directly to good code. **Disadvantages**: * Desugaring gets more complex. It' obvious that since the underlying algorithm is more complex the desugarer would also grow in complexity in accordance. It is hard to say in advance how much of a difference it will make but I hope it won't be too bad. Depending on the runtime of the new algorithm compared to the old one this might also speed up compilation for cases where the old algorithm generated code that had to be heavily optimized by the simplifier. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * owner: (none) => AndreasK -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Be careful not to change the strictness of the function. That's what has prevent most such attempts in the past. (But perhaps strict data constructors, if the program uses them, may help.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Indeed. My plan currently is to select columns in the pattern matrix such that we only choose from these strict in the first row. But my plan is to first formalize the interaction with imprecise exceptions. Then work out heuristics compatible with laziness. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Some notes on exceptions so far: Given code like: {{{ data T = A | B | C | D f :: T -> T -> Int f A A = 1 f B A = 2 f C A = 3 f2 :: T -> T -> Int f2 A A = 1 f2 A B = 2 f2 A C = 3 }}} Ideally I would like to generate the same code for both variants. Check the A column first and then switch on the ABC column reducing the number of case statements. == The theory works out so far. For `f x A` with x = [ABC] we get a number out either way so that is fine. For `f (raisesException1) (raisesException2)` we get different exceptions depending on order of matching. If we break down the matching into case statements as defined in the Haskell Report and apply imprecise exceptions the implementation can choose either exception so this is somewhat ok too. For `f D (raisesException2)` the result is the Exception Set for (error "No Match") currently, changing to the union of (raisesException + error "No Match") if we change the order of evaluation. In theory that is ok. `error x` and undefined are both ⊥ according to the Haskell Report. So the result is the set of all exceptions either way. If we start with the first argument that is because error is equal to the set of all exceptions. If we check the second first we do the "execute all alternatives in exception mode" thing giving the same result as it comes across `error "No Match"` == Are there practical Problems with this approach? The theory only works for all cases if we treat `error` as non termination. With `PatternMatchFail` and `ErrorCall` one can easily distinguish these in practice. Catching failing pattern matches by default seems insane to me so I don't think returning another exception instead of MatchFail should break much that wasn't broken to begin with. Especially given that it only matters if another strict argument raises an exception first. While I can come up with a theoretical scenario where someone ends up with `f (raisesLaterCatchedException) (causesMatchFailure)` where changing the order would break the code personally I don't think that edge case is worth keeping. It depended on an implementation detail in the first place. Is unlikely to occur and when it happens easy to fix. But still I would like some feedback to judge if I underestimate the implications. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): The assessment in comment:4 sounds pretty reasonable to me. I agree that there's little reason to lose sleep over the `f (raisesLaterCatchedException) (causesMatchFailure)` case; such code produce warnings under any production-worthy flag set and users will almost certainly understand that this sort of code is fragile. The `f (raisesException1) (raisesException2)` case is a tad more worrying since users may be mislead by the (change in the) resulting exception. However, as you point out, the Report does say that both exceptions are admissible and I don't think we should be shy about taking advantage of this freedom if there really is performance on the table. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I agree that if the only difference is which exception is raised, then it's like `exn1 + exn2` and we can be imprecise about which one we get. But since you don't give an algorithm, it's hard to tell whether that's all that can happen. For example {{{ f :: T -> T -> Int f A A = 1 f B A = 2 f C _ = 3 f2 :: T -> T -> Int f2 A A = 1 f2 A B = 2 f2 _ C = 3 }}} Here `f1 C exn` will return 3, but `f2 exn C` will throw `exn`. So order matters here. I guess you need to prove that the algorithm you use doesn't change semantics. The other question is this: what performance is on the table? It'd be more convincing if you had some examples where better pattern-match compilation causes significantly less code, or better performance. I'm sure the pattern-match compiler will be more complicated, and we want to trade that complexity for some tangible benefit. You can probably start to answer this question with some hand-written A/B examples: "if we had a better compiler we'd get this much better code". -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * Attachment "trac_pm_example1.hs" added. One case where PM currently generates very suboptimal code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * Attachment "trac_pm_example2.hs" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK):
But since you don't give an algorithm, it's hard to tell whether that's all that can happen.\\
Here f1 C exn will return 3, but f2 exn C will throw exn. So order matters here. \\ I guess you need to prove that the algorithm you use doesn't change semantics.
Indeed. I aim to maintain strictness properties of the current algorithm and if I manage that this can't happen. Intuitively this follows from the fact that if we maintain strictness properties and one of the arguments contains a exception we are guaranteed to trigger or avoid it if the old algorithm would have done so. Actually matching the strictness is the hard part especially avoiding potential loss of strictness. So far I worked out a way to apply good decision trees when I ignore the loss of strictness and a way to prevent loss of strictness on paper. I haven't yet worked out completely how I will combine them and how this will impact the complexity of the algorithm.
You can probably start to answer this question with some hand-written A/B examples: "if we had a better compiler we'd get this much better code".
It's easy to come up with examples where the current approach generates unnecessary large code sizes on paper. What I'm not sure about is how often these actually occur in regular code. So my next steps are: 1. Combine maintaining strictness with the approach in good trees. * Implement it for a subset of ghc's patterns outside of ghc. (Constructor/Variable/Wildcard) * Re implement the current approach in the same framework. 2. Put some instrumentation into the pattern matching code and sample patterns from as much code as I can. 3. Hopefully get results indicating that it's worth implementing in GHC. 4. Implement it as part of GHC. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): I realized I have no written anything about how to actually do this.\\ Nothing of this is set in stone yet but generally: To prevent additional strictness we can always limit argument evaluation to these columns which will be evaluated for any result (ignoring the possibility of pattern match failure). While we do have to compute that property for each column it should be able to do this while getting the data required to pick a good column. So I don't expect that to be too expensive. Preventing loss of strictness is a lot more expensive to calculate I fear. I also did not yet think much about how this impacts potential performance gains or how to optimize/implement it. So with this pretext here we go: Given a matrix of patterns based on your example {{{ A A A B _ C }}} My current idea is to associate constraints with each element starting without no constraints. If we chose column 2 we get on decomposition for C a matrix consisting only of `_`. However during matrix decomposition we check for constraints resulting from our selection and add these to the remaining matrix. In this case we would generate {Col1 != bot} for that entry.\\ When selecting a column we first have to resolve constraints associated with it. So while the resulting matrix has only a wildcard the constraint forces us the generate a case statement for the column ensuring the first parameter gets evaluated. Constraints are created by looking at the rows we eliminate based on the result we generate code for. But I have not yet formalized that to a degree where a detailed writeup on the specifics makes sense. While this is guaranteed to be more expensive then the current approach i hope to that performance gains as well as less work required by the simplifier makes up for it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Sounds complicated enough to be worth a Haskell Symposium paper! Seriously, writing a paper is often a good way to refine an idea. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:9 simonpj]:
Sounds complicated enough to be worth a Haskell Symposium paper! Seriously, writing a paper is often a good way to refine an idea.
Having that opportunity would be wonderful. I have to write it up with rigor for my undergrad thesis anyway. For a update on the progress: * I've put instrumentation code into GHC to extract patterns. * I've coded up a simplified version of the current algorithm outside of GHC. Simplifications are: * It only covers Constructor/Wildcard/Variable patterns. * It only selects a rhs but does no binding/the associated bookkeeping. * Record patterns with a different order then their definition are treated like a different constructor. * It treats all patterns as non-exhaustive. * I fully formulated my approach for matching the strictness of the current algorithm. The next steps are: * Implement my approach. * Get some statistics of their differences. * If it seems worthwhile try to measure the impact of the simplifier. (As the output of the desugar stage can be a lot worse and yet still result in perfect code). * Document the results in a presentable form. * Depending on how promising the results are hopefully implement the new approach in GHC. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): The Algorithm informally explained: Use a recursive match function very similar to the current approach. match :: PatternMatrix -> DecompositionKnowledge -> FailExpr -> DsM(Expr) FailExpr: Same as now. DecompositionKnowledge: Represents the knowledge about each occurrence. Each entry is one of * Evaluated (`Case x of _ -> expr`) * The Value/Tag (`Case x of A -> expr`) * Constructors we can rule out. (`Case x of { _ -> expr; A -> notTaken }` ) PatternMatrix: Similar to the current EquationInfo. However each RHS is extended with a list of constraints which must be satisfied before we can sucessfully match on that RHS. The constraints codify the strictness semantices based on the actual values of the arguments. Each RHS has the constraint of it's own row and the ones of all rows above. So a equation `A !_ B = 1` has the following constraint in pseudo syntax: {{{ Eval (1) && if (Val(1) == A) then Eval (2) && Eval (3) else True }}} In match: * Check which occurrences are strict in ALL Rows. * Select one of these based on some heuristic * Generate alternatives for each constructor in that row (and possibly a default). * Generate the code for the alternatives by: * Removing the evaluated column and unreachable rows from the matrix * Add knowledge gained by taking this alternative. * Calling match again with the updated Matrix/Knowledge to generate the code. Only chosing from the strict set of occurrences is already **enough to guarantee we don't increase strictness**. BUT it can lead to a function being less strict though. See SPJ's example above. Ideally we chose the next occurrence from the strict set such that few comparisons are made to pick a RHS using little code. However so far it seems the heuristic chosen hardly matters. But I have only tried quiet simple ones namely: * Left to Right * Constructor Count in the Row * Constructor Count till the first irrefutable pattern. * Number of different Tags * Right to Left. (The only one which is clearly inferior) **To preserve strictness:** Once a RHS was selected all constraints must be satisfied before we can jump into the actual RHS code. We satisfy `Eval (x)` by evaluating the Occurrence x. When solving `Eval (x)` we generate alternatives for all Constructors mentioned in the constraints. If we resolve the constraints left to right until all are satisfied we can guarantee that we don't skip evaluation of any arguments that should have been evaluated. Going back to SPJ's example to see how this works. Once we evaluted the second argument to `C`: We can immediatly select the third RHS. However we still have the unsolved constraint `Eval(0) && if 0 == A then Eval(1) else True`. To satisfy it we simply create another case expression to evaluate the first argument with a single branch for A. After that all constraints are solved so we put the RHS as the expression of that branch. On top of the above it also require some special casing for View Patterns and Synonyms before it can implemented into GHC. However both can be dealt with by using a mixture rule similar to the current approach. ---- Results so far on ~190k unique simple patterns pulled from hackage packages. (No View/Synonym or N+K Patterns) Most patterns are boring and the result does not or barely change: * For ~150k patterns the resulting code is the some for the original algorithm and decision trees. ~120k of these have only a single equation and hence don't allow for any optimization either way. I used the simplest heuristic (choose the left most strict occurrence) to build a core like AST **so without applying GHC's simplifier** I got these results: * (+0.84%) more Terms are generated by decision trees. * Comparisons (sum over all patterns): * (-1.93%) fewer comparisons to select the worst case. * (-1.14%) fewer comparisons required to select a RHS on average. I tried a few other heuristics as well but there is not enough difference between these to look into it without also comparing in detail how running it through the simplifier changes things. Transpiling the AST used to GHC Source code and looking at the Core there are some small changes but I expect that above numbers to change very little based on these. Looking only at patterns where the algorithm makes a difference for the number of comparisons required the tree based approach makes about 30% fewer comparisons for both worst and average case. While this is a lot it comes with a larger code size (and more join points) so it's hard to tell how much of a difference it makes when it comes to generated machine code. Some of it might be because I didn't push it through the simplifier but that seems to hit the tree based code harder then the current approach for the examples I looked at I'm somewhat optimistic that it is a genuine reduction of case expressions evaluated. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I wonder if anyone else can work with Andreas on this? I lack bandwidth. Very brief thoughts: * Giving two or three actual concrete examples where we get better code would be useful. * Comparing ''after'' simplification is quite important. If the simplifier automatically gets most of the benefit, it wouldn't be worth having a more complicated pattern-match compiler. * Do a `nofib` run with and without your patch. Is there any measurable difference? * How big is your patch? The existing pattern match compiler is already quite subtle; how hard was it to implement your new scheme? Bottom line: to justify the extra complexity and maintenance burden, we'd have to be convinced that some programs run a lot faster; or that many programs run a bit faster. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): It took some time but I have a proof of concept in GHC. This means it falls back to the default algorithm if it encounters: * Non vanilla constructors (So only Haskell98 Constructors) * Records * Any of the pattern extensions. * I've only implemented the trivial left-to-right Heuristic so far. So keep this in mind for the rest of this comment: ---- == Examples Pretty much all of the trivial examples I had collected come out the same either way with -O But then it's not too surprising when using the left-to-right Heuristic. A gain at -O2: {{{ badMix :: Int# -> Int# -> Int# badMix 1# 1# = 1# badMix _ 2# = 1# badMix 1# 3# = 1# ------------------------------------------- = \ ds_dY4 ds1_dY5 -> case ds_dY4 of { __DEFAULT -> case ds1_dY5 of { __DEFAULT -> case badMix1 of wild2_00 { }; 2# -> 1# }; 1# -> case ds1_dY5 of { __DEFAULT -> case badMix1 of wild2_00 { }; 1# -> 1#; -- 2# -> 1#; This alternative is removed in the tree approach 3# -> 1# } } }}} However this doesn't apply when types are changed to Int -> Int -> a. Not yet sure why. I will first finish dealing with Records properly and then see if I can find more examples. At that point it should be somewhat clear if further work makes sense. ---- == Nofib Nofib give a mean of 0% for runtime but -0.1% for time lapsed. Min/Max where -1.9%/+1.8% for runtime. So there seems to be at least some potential there. I plan to see how it turns out when I finish work on Records and add a few Heuristics. If that gives the same result it's probably not worth it but we will see. ---- == Implementation === Size The size of that implementation is about 1100 additional Lines in it's current State. It's however pretty verbose for Haskell and some of it could be combined with existing functionality. Mostly because functions in DsUtil generally make some assumptions that won't hold for my algorithm so require slight adjustments. === Complexity I would say implementing the scheme itself so far was straight forward. What complicated/delayed matters most was figuring out the current codebase in general. In particular some comments in the desugarer were outdated to the point where they confused me more than they helped. There were invariants which contradicted the implementation, references to non-existant functions, the works. I've got a patch committed that fixed some of that but there is much room for improvement still. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
I've got a patch committed that fixed some of that
Oooh, yes please! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * Attachment "criterion_log.txt" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): I added rudimentary support for Records and made it work nicely with -XStrict. After playing around with nofib a lot I have essentially given up using it as a primary benchmarking tool I tried using nofib to benchmark (compile times) for #14524 which was inconclusive even when the performance difference for the function in question was ~10%. Running nofib with my changes didn't really produce results for the same issue of variance drowning out any difference the results would have made. == So my plan for now is: * Ditch nofib for the time being. * Create micro benchmarks and use criterion to check the difference. * Once I feel i have a reasonably sized set of benchmarks: * Look for pattern matching heavy real world examples and benchmark these. Based on the results of the OCaml implementation I don't expect measurable results unless there are not trivial patterns in a inner loop. * Gather feedback if gain/complexity trade off warrants inclusion in GHC. * Depending on the outcome of the above stop working on this or tidy up my code and get it into HEAD. == Micro Benchmarks: I mirror them on [https://github.com/AndreasPK/pattern_benchmarks Github]. Here is a [https://cdn.rawgit.com/AndreasPK/pattern_benchmarks/31dd3321/results/res_noi... result] for two simple patterns. I've uploaded the results also as attachment if GH becomes unavailable. (criterion_log) The two patterns I started with (each once used with enums once with Int) are at worst as fast and at best ~15% faster. (But then 15% of little is still not much). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): How much of a goal is performance at -O1? I have quite a few examples where we gain performance at -O1 but most of these disappear at -O2. Similarly tree matching improves Strict code much more than regular lazy code. However Strict code is reasonably rare so here also not sure how to rate the tradeoff. Code of the sort `nf (sum . map (\(a,b,c) -> (Aug.gen_func_1 a b c)))` where f is a random pattern mapping Enums to Ints is 2%-3% faster (average and median) when applying it to all possible enum combinations. (At -O2) with a bigger difference at -O1 or if I make the code strict. Benchmarking/Finding "normal" code where it makes a difference turned out quite hard though. * It requires a minimum complexity of the pattern to make a difference. (At least two strict arguments, and it has to be advantageous to swap the evaluation order). * Pattern matching often contributes just a fraction of the total time making it more difficult to benchmark the difference. * (Speculation): Code where patterns are performance sensitive already accounts for the current algorithm. * Most of hackage doesn't build without modification on HEAD making it very time consuming to get benchmarks building at all. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): When evaluating when it's okay to replace one exception by another, it might be worth considering the ideas laid out in Exceptions/PreciseExceptions. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: dfeuer (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Thanks dfeuer. The imprecise exceptions paper **** == The bad == * Compilation is slowed down too much currently. This should be fixable as I haven't optimized my code AT ALL yet. * For larger benchmarks advantages often disappear at -O2 (with microbencharms being faster). I'm not completly sure why. Possibly reasons I thought of: 1. Performance reliant code is written with the current solution in mind 2. The larger code size causes more cache misses 3. There are some nasty edge cases 1) This doesn't make this approach worse, but might be a reason why it makes little differences in performance sensitive libraries which I used for benchmarks. 2/3) The larger code size is a given with this approach. I don't think it's a big issue outside of edge cases. However edge cases can blow up significantly. A good example the following which splits up the decision tree in a bad way. {{{ f1 _ A E = 1 f1 B _ _ = 2 f1 C _ C = 3 f1 A E A = 8 f1 _ _ _ = 5 }}} * We can't start at the third as we might not evaluate it, eg `f1 B B B` * We can't start at the first for the same reason, eg `f1 A A E` A perfect solution would switch between backtracking and tree based based on the pattern encountered. But maintaining both sounds like a nightmare and wouldn't be worth the trouble. There are also a few small things that I can still adjust which might enable a bit more CSE. But I'm not too optimistic that it will change much. == The good == * Regular code at -O0/1 gets faster on average when the algorithm applies. * Strict code runs faster with tree matching even at -O2 * The compiliation speed should be fixable * There are some small improvements still doable * Improve the heuristic which selects the column to match first * Change placement of the cleanup code which evaluates arguments not necessary for matching. (This is necessary to satisfy the exception/nontermination semantics) * I have enabled tree matching only for Vanilla Constructors as I had an issue with GADT's which didn't want to tackle at the time. == Where to go from here == I plan to at least: * Try to enable tree matching on GADTs and see if any of the benchmarks I looked at change. * Play around with the clean up placement But I have to get my bachelor out of the way first in the next 1-2 Months. Then I will see how the above change the story at -O2. For O1/Strict code the speedup of a few % at time would already be worth it if I can reign the compile times in. But currently at O2 the speed is usually equal sometimes worse and slightly less often faster using tree matching. Assuming it stays that way changing it only makes sense if GHC ever gets profile guided optimization. As we then could chose the best column to match on based on actual data. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees" -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): = There are shortcomings in the current codegen of GHC that eliminate much of the gains to be had here. * Changing the order of pattern matches can lead to more floating of case alternatives to the top level. The associated cost from calling a top level function overshadows potential gains when that happens. This is expensive since: * They have to confirm to the calling convention, so can require register shifting. * They prohibit us from falling through to the alternative. * They split the code in memory usually leading to more cache pressure. There is an ticket about the issue already: #15560 * The algorithm depends on shared pattern matching subtrees being commoned up which doesn't work yet. As a consequence we quite often end up with duplicate code which kills performance. * I had hoped to leave this to CSE but it turns out GHC's CSE is not as good as one would hope in this regard. See https://ghc.haskell.org/trac/ghc/wiki/MoreCSE for some discussion. * We could do an special CSE for just the pattern matching trees. Butduring desugaring we don't work bottom up but top down. We generate functions which take as argument the expression to put into the case alternatives instead of directly generating an AST expression. It wasn't clear how to work with or refactor this to allow commoning up of pattern matching subtrees at the time I last looked at this. * Last and least: The current codelayout is heavily dependent on how we generate `if` statements for Cmm. As consequence performance can vary a lot when changing the order of pattern matches. I did some work on code layout which hopefully will help with this in #15124. Which might change the performance difference between differing pattern match algorithms. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14201#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC