[GHC] #8814: 7.8 optimizes attoparsec improperly

#8814: 7.8 optimizes attoparsec improperly ------------------------------+-------------------------------------------- Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Keywords: | Operating System: MacOS X Architecture: x86_64 | Type of failure: Runtime performance bug (amd64) | Test Case: Difficulty: Unknown | Blocking: Blocked By: | Related Tickets: | ------------------------------+-------------------------------------------- {{{ #!haskell {-# LANGUAGE OverloadedStrings #-} import Control.Applicative import qualified Data.Attoparsec.Text as A import Data.Text (Text) import qualified Data.Text as T testParser :: Text -> Either String Int testParser f = fmap length . A.parseOnly (many (A.char 'b' <|> A.anyChar)) $ f main :: IO () main = print . testParser $ T.replicate 50000 "a" }}} On GHC 7.6.3 with `-O2`: {{{ real 0m0.062s user 0m0.022s sys 0m0.007s }}} On GHC 7.8 tip with `-O2`: {{{ real 0m12.700s user 0m12.504s sys 0m0.165s }}} On GHC 7.6.3 with `-O0`: {{{ real 0m0.077s user 0m0.025s sys 0m0.007s }}} On GHC 7.8 tip with `-O0`: {{{ real 0m0.055s user 0m0.026s sys 0m0.007s }}} This seems to be related to the use of `<|>`; if I change the program so that the second branch (`A.anyChar`) is never taken, 7.8 behavior is roughly the same as 7.6 under any optimization level. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by tibbe): * cc: bos@…, johan.tibell@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): That's a huge regression. Are you sure all libraries in the GHC tree where compiled with optimization turned on? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by joelteon): Yeah it is pretty serious. I just cleared out ~/.ghc and reinstalled with "optimization: 2" in my cabal config. Same result. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by joelteon): Talked with some people in #ghc. `-fno-full-laziness` eliminates the issue. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonpj): Can you identify on which module `-fno-full-laziness` makes the difference? Is it just the module above, or do you need to set that flag for `attoparsec` itself? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by joelteon): Using it when compiling the module above is enough. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): Replying to [comment:6 joelteon]:
Using it when compiling the module above is enough.
Could you dump the core and see if you can spot what difference the flag makes? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonpj): I have not had any time to devote to this. I tried {{{ ghc -O T8814.hs -ddump-simpl -o T8814 }}} with and without `-fno-full-laziness`. Indeed I see the perf difference. The Core from `-ddump-simpl` looks very different. Inside `Main.$wa` you'll see a call to `runSTRep`. The function to which `runSTRep` is applied looks very different. * Without full laziness, it consists of a call to `newArray#` followed by a couple of `memcpy` calls * With full laziness, it has a rather complicated local recursive function that allocates a LOT of memory. I have no idea why. I think it must be to do with optimisations being done by RULES in the text library. If I add `-ddump-rule-firings` and grep for `TEXT` in the rule names, I get {{{ -- With full laziness Rule fired: TEXT append -> fused Rule fired: TEXT append -> fused Rule fired: TEXT append -> fused Rule fired: TEXT append -> fused Rule fired: TEXT append -> unfused Rule fired: TEXT tail -> unfused Rule fired: TEXT tail -> unfused -- Without full laziness Rule fired: TEXT append -> fused Rule fired: TEXT append -> fused Rule fired: TEXT append -> fused Rule fired: TEXT append -> fused Rule fired: TEXT append -> unfused Rule fired: TEXT append -> unfused Rule fired: TEXT append -> unfused Rule fired: TEXT tail -> unfused Rule fired: TEXT tail -> unfused Rule fired: TEXT append -> unfused }}} So there is clearly a difference. Should that difference have such a massive performance impact? Ask the author of the text library! Why does full laziness have the effect? Well if you have `(\x. map (f x) (map g ys))`, say, full laziness may float out the `map g ys` and then the map/map fusion won't happen. At this point I hope that someone else will take over debugging to find out more. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Changes (by darchon): * related: => #8763 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Comment (by bos): I wrote a [https://github.com/bos/attoparsec/commit/b9a21b8e265949ec2d9541409b0a252aad3... workaround for this problem] that makes this particular performance regression vanish. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Comment (by simonpj): We're a bit stalled here. There is something mysterious going on, which a single INLINE pragma (Bryan's patch) fixes. But why? My guess is that it's something to do with the interaction between inlning and attoparsec's RULES. For example if a rule optimises `(f (g x))`, and `g` gets inlined, the rule won't fire. Or, if {{{ h x = f x + 1 }}} then the expression `h (g x)` won't fire the rule unless `h` is first inlined. The "phase" annotations on INLINE pragmas and RULES let you control this stuff. So the bug might not be in GHC; it might just be a missing phase annotation. Or there might be a bug in GHC. We won't know until someone digs further. Apart from lack of time, the difficulty is that I have no clue how attoparsec's RULES are supposed to work. So I think we are stalled unless/until someone feels able to do some digging to isolate what is going on. Thanks! Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Comment (by joelteon): I still think this bug is in GHC. It shouldn't take 15GB of RAM to compile a program with a large chain of `<|>`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Comment (by simonpj): But this ticket is about the ''run-time'' of the particular program given in the description of this ticket. There may well be another bug, to do with the ''compile-time'' of an entirely different program. If so, could you open a ticket for that, including a way to reproduce? (Having checked that there isn't one already.) It's confusing to mix up two bugs into one ticket! Thanks Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Comment (by joelteon): My apologies. I've created #8852 for the compile-time bug. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Comment (by bos): Replying to [comment:11 simonpj]:
We're a bit stalled here. There is something mysterious going on, which a single INLINE pragma (Bryan's patch) fixes. But why? My guess is that it's something to do with the interaction between inlining and attoparsec's RULES.
I assume you're referring to text's RULES, as attoparsec doesn't contain any (it does contain a lot of inlining, though). The goal behind the RULES in text is to expose opportunities to perform stream fusion and, if the rewrite phases do not reveal any, to drop from the fusion style of programming back to thwacking directly on an array (which is typically a lot faster). For example, here are the two main RULES that you spotted in the output above: {{{ {-# RULES "TEXT append -> fused" [~1] forall t1 t2. append t1 t2 = unstream (S.append (stream t1) (stream t2)) "TEXT append -> unfused" [1] forall t1 t2. unstream (S.append (stream t1) (stream t2)) = append t1 t2 #-} }}} If we see an unadorned use of `append` in an early phase, we rewrite it so that a fusion rule can have a chance to transform it. Once we get closer to the end of our phases, if we find that a fusion-style append hasn't yet been gobbled up, we transform it back to the direct style. There's a tangential wrinkle here: for the longest time, the definition of `append` had an `INLINE` annotation, [https://github.com/bos/text/commit/109941228d16cc873a08da9924cb881c51be853b which I just removed]. I don't believe this is implicated in the slowdown. Please let me know what else it would be helpful to explain. As you know, forests of rewrite rules are rather fragile affairs, so it's entirely possible that there's been a longstanding bug in those rules (and perhaps ''not in the compiler'') that is merely now being exposed in 7.8. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Comment (by simonpj): What would be really helpful would be if you, or someone else, could diagnose exactly why the slow-down is happening. If you compile with `-ticky` you can very quickly zero in on the code that is taking more time, and compare one with the other. It would be remarkable if fusion was really responsible for such an enormous change in time, but perhpas it is. Maybe it would be worth commenting out RULES one at a time, to see which (if any) is responsible, and to reduce accidental differences between the two. Maybe the test case can be boiled down quite a lot further. You can switch off ALL rule rewrites with `-fno-enable-rewrite-rules`. Switching off full laziness might also be a good thing to try `-fno-full- laziness`. Reducing the inlining threshold (which doesn't affect INLINE pragmas) would mean less looking at inlined code. `-funfolding-use-threshold=N` (default is 60) It's a bit fiddly and time consuming, which is why I was appealing for help. I'll gladly explain anything you (or whoever) wants to know about the GHC end of things. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Comment (by simonpj): See also #8835, which we believe to be a dup of this one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly --------------------------------------------+------------------------------ Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime performance bug | (amd64) Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8763 --------------------------------------------+------------------------------ Changes (by ihameed): * cc: idhameed@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly -------------------------------------+------------------------------------- Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: | Architecture: x86_64 (amd64) Unknown/Multiple | Difficulty: Unknown Type of failure: Runtime | Blocked By: performance bug | Related Tickets: #8763 Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by thomie): * os: MacOS X => Unknown/Multiple -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8814: 7.8 optimizes attoparsec improperly -------------------------------------+------------------------------------- Reporter: joelteon | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Inlining Operating System: Unknown/Multiple | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8763 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by mpickering): * keywords: => Inlining -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8814#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC