[GHC] #10400: Run time increases by 40% in fractal plotter core loop

#10400: Run time increases by 40% in fractal plotter core loop -------------------------------------+------------------------------------- Reporter: saffroy | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Keywords: | Operating System: Linux Architecture: x86_64 | Type of failure: Runtime (amd64) | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Revisions: | -------------------------------------+------------------------------------- I'm writing a toy fractal plotter to teach myself some Haskell. For a while I used GHC 7.4, recently upgraded to 7.10, and found a significant drop in run time performance. Tests with other versions show that the drop dates back to 7.8. Not sure if this is a good practice, or if this is relevant here, but I used bangs to improve performance by adding strictness: this gives a good speedup with 7.4/7.6, but perf is unchanged with 7.8/7.10. || version || run time (sec) - bangs || no bangs || ||7.4.2 || 6.09|| 18.73|| ||7.6.3 || 6.10|| 18.93|| ||7.8.4 || 8.92|| 8.88|| ||7.10.1 || 8.92|| 9.04|| (All builds with ghc -O2.) Source: {{{#!hs {-# LANGUAGE BangPatterns #-} iterations :: Int -> Double -> Double -> Double -> Double -> Int -> Int iterations !maxK !x !y !x0 !y0 !k = let !x2 = x * x !y2 = y * y in if k == maxK || x2 + y2 > 4 then k else iterations maxK (x2 - y2 + x0) (2 * x * y + y0) x0 y0 (k + 1) inSet :: Int -> Int -> (Double, Double) -> Bool inSet !minK !maxK (!x, !y) = k >= minK && k < maxK where !k = iterations maxK x y x y 0 main = do let pointSource = repeat $ ((-0.6518209274278592), 0.3549264214581329) nPoints = 1000 * 1000 points = take nPoints pointSource selected = filter (inSet 1000 (1000 * 20)) points print $ length selected == nPoints }}} The host C compiler is gcc 4.7.2. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10400 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10400: Run time increases by 40% in fractal plotter core loop -------------------------------------+------------------------------------- Reporter: saffroy | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): Thanks! Looks interesting. Have you compared the RTS stats (pass `+RTS -s` to your binary)? Have you compiled your program with profiling enabled, and created a profile (`+RTS -p`), and compared that? Finally, you could compare the produce Core (compile with of `-ddump- simpl`) for obvious differences. Is there a noticable difference whether you use `take n (repeat ..)` or `replicate n ...`? If so, maybe some list fusion is happening or not happening. That should be observable in the RTS stats (number of allocations), or in the Core. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10400#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10400: Run time increases by 40% in fractal plotter core loop -------------------------------------+------------------------------------- Reporter: saffroy | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by saffroy): Replying to [comment:1 nomeata]:
Thanks! Looks interesting.
Thank you for looking! Below I am comparing 7.6 (last fast version for my use case) to 7.10 (last stable). I have attached various outputs (stats, profiles, cores) for your viewing pleasure.
Have you compared the RTS stats (pass `+RTS -s` to your binary)?
Have you compiled your program with profiling enabled, and created a
Memory usage is considerably smaller with 7.10, though GC times are negligible in both cases. profile (`+RTS -p`), and compared that? Profiles look the same.
Finally, you could compare the produce Core (compile with of `-ddump- simpl`) for obvious differences.
There are differences in the core loop (''iterations''), but I can't really tell what changed... Can you please have a look? I have attached them.
Is there a noticable difference whether you use `take n (repeat ..)` or `replicate n ...`? If so, maybe some list fusion is happening or not happening. That should be observable in the RTS stats (number of allocations), or in the Core.
So I did this: {{{ --- a/ghc_test.hs +++ b/ghc_test.hs @@ -16,6 +16,6 @@ inSet !minK !maxK (!x, !y) = k >= minK & main = do let pointSource = repeat $ ((-0.6518209274278592), 0.3549264214581329) nPoints = 1000 * 1000 - points = take nPoints pointSource + points = replicate nPoints $ head pointSource selected = filter (inSet 1000 (1000 * 20)) points print $ length selected == nPoints }}} With this change, I see no difference in stats or run time. I do see small differences in the core, but for the core loop they seem small. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10400#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10400: Run time increases by 40% in fractal plotter core loop -------------------------------------+------------------------------------- Reporter: saffroy | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by rwbarton): Almost all the time is spent inside `iterations`, which is a 20-instruction long loop in both the 7.6 and 7.8 versions. The two loops are very similar, but some detail causes the 7.8 loop to run about 50% slower, probably some pipeline stall issue. It looks hard to even track down the exact issue, let alone deal with it in the code generator in a general way. I note that the LLVM backend produces a smaller loop which is a bit faster than even the 7.6 loop. The LLVM backend is usually good at optimizing this kind of non-allocating code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10400#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10400: Run time increases by 40% in fractal plotter core loop -------------------------------------+------------------------------------- Reporter: saffroy | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by saffroy): Ah! Indeed, with `-fllvm` the run times are much more consistent across versions of GHC. || version || run time (sec) - bangs || no bangs || ||7.4.2 || 4.99|| 19.67|| ||7.6.3 || 4.99|| 19.30|| ||7.8.4 || 4.80|| 4.80|| ||7.10.1 || 4.97|| 4.97|| In this case, do we keep this bug open? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10400#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10400: Run time increases by 40% in fractal plotter core loop -------------------------------------+------------------------------------- Reporter: saffroy | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: invalid | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by nomeata): * status: new => closed * resolution: => invalid Comment:
In this case, do we keep this bug open?
Assuming rwbarton’s analysis is correct, I think we can close this bug with a *shrug*. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10400#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10400: Run time increases by 40% in fractal plotter core loop -------------------------------------+------------------------------------- Reporter: saffroy | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: invalid | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by rwbarton): That said, we do very much appreciate bug reports like this one, even if in the end we are unable to do much about them! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10400#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC