
#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): @simonpj, @sgraf, first of all, thank you very much for your time, taking look at this issue and investigating it so deeply. Thanks to it I have hope we will fix it! 1. First of all I'm simply amazed by the amount of workarounds here. I know I was writing this before in other tickets, but I care a lot about predictive performance in GHC. All the described things show me that its worse than I thought, see the following points. 2. In order to write high performance code we need invariants. Rules that we can follow and we can trust that they allow us get **exactly** the behavior we want. One of the most important invariants to me (probably the most important one) was always that if I use the INLINE pragma, the code will be inlined if the call is saturated and it will have exactly the same behavior if I just copy paste it there. I always understood that the INLINE pragma is exactly for this - to very precisely guide GHC how to optimize the code. Learning that GHC does not really inline all explicitly marked saturated calls and sometimes it gets better specialization when removing the INLINE pragma is just insane. it breaks the most primitive invariant that we can rely on and without it we cannot predict just anything about the performance of code we write. For me this is critical error. 3. Moreover I strongly disagree with the sentence that the "fix would be to remove INLINE pragma" because this leaves us in a world where GHC performance is completely unknown and we have to randomly enable / disable things hoping that it will magically get better. I suspect @sgraf that you didn't mean "fix" but instead a "dirty workaround for now", but I preferred to emphasize my worries regarding this matter. 4. Answering your question @simonpj, I completely understand that GHC is super-cautious about inlining things and making the code bigger, but that is exactly the reason why we can fine tune the behavior by telling GHC that we in reality want it to be inlined, right? Exactly this led me to use INLINE pragma here. When writing this code I know I will have here some very tight loops to be optimized and I know that no matter what it sohuld be inlined. 5. I'm surprised that `test1` is not eta expanded. How can I be sure my functions get eta expanded? I have read the `Note [Do not eta-expand PAPs]` but it's still not clear to me. What are the "invariants here". If I write high performance code should I always manually eta-expand functions? Should I rewrite it to `test1 = \t -> runTokenParser testGrammar1 t`? Please correct me if my thinking is wrong here. 6. Regarding the `test2`, specialization and over-specialising things. I see the problem and I don't know what approach would be good here. The only thing that is clear to me is that the change to code is so small, that nobody should expect such drastic performance changes and we should have a clear way of preventing such things from happening (again - invariant of how to write high performance code when we want to wrap some data in a compile-known constructor just to refactor things). 7. Looking at the source code I don't see any mention of the `SPEC` in GHC.Type. Where it cames from and where can I learn more about it? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler