[GHC] #14414: Profiled program runs 2.5x faster than non-profiled

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- I was looking at benchmarks game (attached as fasta.ghc-2.hs). I have found that with the flags give there, this program on GHC 8.2.1 runs in about a second with `-prof -fprof-auto` and 2.5 seconds without! To run without profiling: {{{ ghc --make -Wall -fforce-recomp -fllvm -O2 -XBangPatterns -threaded -rtsopts -XOverloadedStrings fasta.ghc-2.hs -o fasta.ghc-2.ghc_run && ./fasta.ghc-2.ghc_run +RTS -N4 -s -RTS 250000 > /dev/null }}} Same program with profiling {{{ ghc --make -fforce-recomp -fllvm -prof -fprof-auto -O2 -XBangPatterns -threaded -rtsopts -XOverloadedStrings fasta.ghc-2.hs -o fasta.ghc-2.ghc_run && ./fasta.ghc-2.ghc_run +RTS -N4 -p -s -RTS 250000 > /dev/null }}} I also attach Core outputs for both profiled and unprofiled version. To me this seems very strange: profiled version is somehow faster. Perhaps what's worse is that this means that there's some optimisation GHC is performing when profiling is not on that makes the program a lot slower than it could be! This program is not minimised. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by Fuuzetsu): * Attachment "fasta.ghc-2.hs" added. Program itself -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by Fuuzetsu): * Attachment "fasta.ghc-2.dump-simpl-prof" added. Core of profile program -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by Fuuzetsu): * Attachment "fasta.ghc-2.dump-simpl-no-prof" added. Core of unprofiled program -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Hmm, this appears to be a different version of `fasta` than what we have in `nofib`. Perhaps it would make for a good addition. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I must be missing something; I can't reproduce on Linux with GHC 8.2.1 and LLVM 3.9.1. ||= Flags = ||= NCG =||= NCG =||= LLVM =||= LLVM =|| || ||= MUT =||= GC =||= MUT =||= GC =|| || `-O0` || 24.8 || 5.42 || 24.22 || 5.42 || || `-O1` || 6.55 || 1.24 || 4.38 || 0.73 || || `-O2` || 5.37 || 0.91 || 4.28 || 0.83 || || `-O1 -prof` || 8.68 || 1.68 || 7.78 || 1.36 || || `-O2 -prof` || 8.28 || 1.47 || 7.13 || 1.58 || || `-O1 -prof -fprof-auto` || 17.63 || 1.72 || 17.01 || 1.95 || || `-O2 -prof -fprof-auto` || 15.26 || 1.71 || 14.36 || 1.64 || -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Using the command lines given in the ticket description: || ||= MUT =||= GC =|| ||= Without profiling =|| 0.152 || 0.116 || ||= With profiling =|| 0.843 || 0.293 || -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Fuuzetsu): I don't know what you could be missing. I removed LLVM. I'm on i7 6770k, 64GB RAM, SSD. {{{ [shana@lenalee:/tmp]$ ghc --make -ddump-simpl -ddump-to-file -fforce- recomp -O2 -XBangPatterns -threaded -rtsopts -XOverloadedStrings fasta.ghc-2.hs -o fasta.ghc-2.ghc_run && ./fasta.ghc-2.ghc_run +RTS -N4 -s -RTS 250000 > /dev/null [1 of 1] Compiling Main ( fasta.ghc-2.hs, fasta.ghc-2.o ) Linking fasta.ghc-2.ghc_run ... 368,568,064 bytes allocated in the heap 15,735,800 bytes copied during GC 3,692,976 bytes maximum residency (7 sample(s)) 226,896 bytes maximum slop 9 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 137 colls, 137 par 1.660s 1.660s 0.0121s 0.0280s Gen 1 7 colls, 6 par 0.127s 0.127s 0.0182s 0.0274s Parallel GC work balance: 35.73% (serial 0%, perfect 100%) TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.000s ( 0.000s elapsed) MUT time 0.206s ( 0.206s elapsed) GC time 1.787s ( 1.787s elapsed) EXIT time 0.000s ( 0.007s elapsed) Total time 1.993s ( 2.000s elapsed) Alloc rate 1,789,756,100 bytes per MUT second Productivity 10.3% of total user, 10.7% of total elapsed gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].sync: 0 gen[1].sync: 0 [shana@lenalee:/tmp]$ ghc --make -ddump-simpl -ddump-to-file -prof -fprof- auto -fforce-recomp -O2 -XBangPatterns -threaded -rtsopts -XOverloadedStrings fasta.ghc-2.hs -o fasta.ghc-2.ghc_run && ./fasta.ghc-2.ghc_run +RTS -N4 -s -p -RTS 250000 > /dev/null [1 of 1] Compiling Main ( fasta.ghc-2.hs, fasta.ghc-2.o ) Linking fasta.ghc-2.ghc_run ... 653,685,432 bytes allocated in the heap 42,865,544 bytes copied during GC 4,548,264 bytes maximum residency (9 sample(s)) 584,024 bytes maximum slop 10 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 256 colls, 256 par 0.371s 0.371s 0.0014s 0.0073s Gen 1 9 colls, 8 par 0.015s 0.015s 0.0017s 0.0063s Parallel GC work balance: 16.49% (serial 0%, perfect 100%) TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.000s ( 0.000s elapsed) MUT time 0.487s ( 0.487s elapsed) GC time 0.386s ( 0.386s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 0.000s ( 0.000s elapsed) EXIT time 0.000s ( 0.001s elapsed) Total time 0.874s ( 0.874s elapsed) Alloc rate 1,340,938,477 bytes per MUT second Productivity 55.8% of total user, 55.8% of total elapsed gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].sync: 0 gen[1].sync: 0 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by duog): I couldn't reproduce this either. The times in comment:4 are strange, the elapsed times are the same as the normal times. The elasped times should be about 4x smaller. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Fuuzetsu): Let me know if there's any information I can provide to help debug this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by duog): Note that the entirety of the anomolous time is in the garbage collector. It looks to me like the program is running on only one core. If that's the case, then perhaps the garbage collector is spending entire time slices busy-waiting on a spin lock. As for why it doesn't happen while profiled, perhaps the non-profiled code has allocation-free loops, they can have poor interactions with garbage collection. Some things to try: * run the program inside `time` to verify that the numbers -s is reporting are correct * use `time` to prove that a non-haskell program is able to run on multiple cores. Sorry I don't have a one-liner for you... * try the profiled version, omitting -fprof-auto, this (should) give the same core as without -prof, and may exhibit the bad gc behaviour. * play with the -q* and -N RTS options to see if anything changes. * use https://wiki.haskell.org/ThreadScope to see exactly what the garbage collector is up to. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Indeed the spinning explanation sounds quite plausible to me. In addition to the suggestions made by duog, you might also consider trying with `-fno-omit-yields`, which ensures that even non-allocating code segments perform an occasional heap check. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): To the filer: what operating system are you on? What hardware? How many cores? Can you try a later ghc version? Ben: I can't find the spinning explanation you reference. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): I am not able to reproduce this on my Mac running 10.13.6 with 4 cores and GHC version 8.6.0.20180714 time for profiled version: Total time 1.244s ( 0.451s elapsed) time for non-profiled version: Total time 0.242s ( 0.134s elapsed) Also I checked that both programs produce the same output by writing to a file and doing a diff. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: infoneeded Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by George): * status: new => infoneeded * failure: None/Unknown => Runtime performance bug -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: infoneeded Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Fuuzetsu): I believe I was on NixOS, IIRC Intel i7 6700k CPU on 8.2. That machine is on another continent and I won't be able to retry the experiment... If someone can confirm the bug doesn't occur on Linux with modern GHC, I'm happy for the ticket to be closed. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14414: Profiled program runs 2.5x faster than non-profiled -------------------------------------+------------------------------------- Reporter: Fuuzetsu | Owner: (none) Type: bug | Status: infoneeded Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I also can't reproduce this with GHC 8.4.2 on Linux. When I increase the input by 10x (otherwise it runs too fast on my system) and call `time` I get these numbers: - Profiled: 12.18s, `-s` reports: 12.17s - Non-profiled: 2.00s, `-s` reports: 1.9s Non-threaded runtime: - Profiled: 3.73s, `-s` reports: 3.72s - Non-profiled: 1.29s, `-s` reports: 1.28s -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14414#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC