[GHC] #15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs

#15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- We are investigating various ways to make `for_` and `traverse_` not leak space by changing the type signature from `(a -> f b) -> t a -> f ()` to `(a -> f ()) -> t a -> f ()`. While doing so, we noticed a regression from GHC 8.2.2 to 8.4.3 and 8.6.1. The code: https://gist.github.com/nh2/b8f9f8e60443bdb30c1cd7e0acb8c8eb/bb1cc1a4987091f... Run against 3 different GHC releases: {{{ # 8.2.2 stack --resolver lts-11.22 ghc -- --make -O2 -rtsopts ./TraverseMaybePerformance.hs && /usr/bin/time ./TraverseMaybePerformance 8 +RTS -sstderr 460,309,368 bytes allocated in the heap # 8.4.3 stack --resolver lts-12.11 ghc -- --make -O2 -rtsopts ./TraverseMaybePerformance.hs && /usr/bin/time ./TraverseMaybePerformance 8 +RTS -sstderr 860,301,736 bytes allocated in the heap # 8.6.1 stack --resolver nightly-2018-10-06 ghc -- --make -O2 -rtsopts ./TraverseMaybePerformance.hs && /usr/bin/time ./TraverseMaybePerformance 8 +RTS -sstderr 860,301,784 bytes allocated in the heap }}} Allocations doubled starting with 8.4. All was run on Ubuntu 16.04 64-bit. We haven't investigated in detail yet (also whether it's a GHC or libraries problem) since we're actually trying to do something else and this came out on the side, but it looks important enough to share already. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15717 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * cc: simonpj (added) Comment: This regression was introduced in commit 71037b61597d8e80ba5acebc8ad2295e5266dc07 (`Join-point refactoring`). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15717#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.6.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * milestone: => 8.8.1 Comment: Is this something that could be tackled for 8.8? Some of the real-world applications we'd really like to upgrade past 8.2 appear to get slower from this. (Tentatively seting milestone) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15717#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.6.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Doubling sounds bad. There are 9 tests in the repro case. 've lost track of what is what. If someone was able to distil out a simple before-and-after on one test, that would be helpful. Better still, get some insight into what is happening. Thanks! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15717#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.6.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Replying to [comment:3 simonpj]:
There are 9 tests in the repro case. I've lost track of what is what. If someone was able to distil out a simple before-and-after on one test
This part I can answer immediately: I use only `./TraverseMaybePerformance 8` in the issue description, so that's `test8` form the repro. But yes, a smaller test case will make things easier. Here it is: {{{ #!/usr/bin/env stack -- stack --resolver lts-11.22 script --optimize -- The above one is fast. Slow is: -- stack --resolver lts-12.11 script --optimize {-# OPTIONS_GHC -Wall #-} import Control.Concurrent (yield) import Control.Monad (when) import qualified Data.Text.Lazy as T import qualified Data.Foldable as F myfor_ :: (Foldable f, Applicative app) => f a -> (a -> app ()) -> app () myfor_ t f = case F.toList t of [] -> pure () x -> go x where go [x] = f x go (x:xs) = f x *> go xs printChars_myfor_ :: Int -> T.Text -> IO () printChars_myfor_ idx t = myfor_ (T.uncons t) $ \(c, t') -> do when (idx `mod` 100000 == 0) $ do -- Using putStrLn, I observe 2x more allocations with GHC 8.4.3 vs 8.2.2 (860 vs 460 M) --putStrLn $ "Character #" ++ show idx ++ ": " ++ show c -- Using yield, I observe 4x more allocations with GHC 8.4.3 vs 8.2.2 (860 vs 220 M) yield -- the putStrLn isn't necessary, this is enough to trigger the regression printChars_myfor_ (idx + 1) t' main :: IO () main = printChars_myfor_ 1 $ T.replicate 5000000 $ T.singleton 'x' }}} I even managed to make the regression 4x worse using `yield`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15717#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.6.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * Attachment "TraverseMaybePerformanceSimplified-GHC-bug-15717-GHC-8.2.2 .dump-simpl" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15717 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.6.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * Attachment "TraverseMaybePerformanceSimplified-GHC-bug-15717-GHC-8.4.3 .dump-simpl" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15717 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15717: Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.6.1 Resolution: | Keywords: Simplifier Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => Simplifier -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15717#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC