[GHC] #15418: Performance drop 60 times on non-profiling binary

#15418: Performance drop 60 times on non-profiling binary ----------------------------------------+--------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Keywords: | Operating System: MacOS X Architecture: Unknown/Multiple | Type of failure: None/Unknown Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: ----------------------------------------+--------------------------------- I have a rather large application that I have spent quite some time on tuning it as performance is just too bad. I have come to a point where the profiler reports about 14 seconds on a particular example, which is about 5x slower than I would like to see. However, building the same binary without profiling ability results in that the example takes 14 _minutes_. If I just touch a file and build with profiling information, but not giving it any profiling related RTS options, it takes about 30 seconds (not 14 seconds, but that is probably due to profiling overhead being there). How can performance drop 60x when I basically just relink it? From what I can see using "top" and memory profiling, memory consumption is quite stable over time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1):
How can performance drop 60x when I basically just relink it?
If you previously linked with profiling libraries and runtime, relinking it with non-profiling libraries and runtime (or the other way around) can of course make a huge difference. In one case all library functions are profiled and you're using profiling RTS, in the other they're not profiled and the RTS has less overheads. I'm confused about what exactly you're doing. I understand that you have one program that you want to optimize, and you're trying different profiling settings. You have these three configurations: - Compiled without profiling and run - Compiled with profiling and run without profiling (i.e. no `+RTS -p`) - Compiled with profiling and run with profiling (`+RTS -p`) Can you report numbes for each of these? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hth313): Normal build (report using time) {{{ real 16m17.687s user 16m1.916s sys 0m10.150s }}} Profiling build (no `+RTS -p`) {{{ real 0m26.963s user 0m26.562s sys 0m0.378s }}} Profiling build, `+RTS -p` {{{ real 0m27.528s user 0m27.121s sys 0m0.471s }}} Peeking inside generated `.prof` {{{ total time = 14.04 secs (14037 ticks @ 1000 us, 1 processor) total alloc = 17,827,853,336 bytes (excludes profiling overheads) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * related: => #14414, #9599 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): Can you provide a reproducible case? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hth313): I can provide a binary with needed input files for download. I can also try on my side if I get some suggestions what to do. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): Best to have source also with details on how to produce a binary and how to run it so people can try this on different ghc versions and on different platforms -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by George): * failure: None/Unknown => Runtime performance bug * architecture: Unknown/Multiple => x86_64 (amd64) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hth313): Code base is unfortunately not open source. Problem is at least also on GHC 8.2.2 as that is I was using when I encountered it. I upgraded to 8.4.3 but unfortunately the problem was still there. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): I think without a reduced test case with source this will be very hard to fix. It would be interesting to see if you can reproduce the issue in the related tickets above as I don't believe others have been able to. Others will probably have suggestions about what you can do on your side. You might want to take a look at the performance debugging section of https://ghc.haskell.org/trac/ghc/wiki/Debugging -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hth313): I tried #14414 but I cannot reproduce that problem. Building my program on Linux (x86_64), I can confirm the problem I have encountered is also present there. I also tried to build with `-ticky`, but I cannot say the output made me any wiser. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hth313): Trying some low-tech profiling I compiled with `-debug` and ran in `lldb`, then stop using ctrl-c from time to time to see what it was up to. The following two are definitely main offenders, especically the second one. The first is the garbage collector in some way, I am not really sure what is going on in the second. {{{ * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x000000010097d553 tcc6502`countBlocks(bd=0x0000004202400fc0) at BlockAlloc.c:782 frame #1: 0x0000000100988cd6 tcc6502`genBlocks(gen=0x0000000110c07528) at Sanity.c:870 frame #2: 0x0000000100988fff tcc6502`memInventory(show=false) at Sanity.c:900 frame #3: 0x00000001009821b8 tcc6502`GarbageCollect(collect_gen=0, do_heap_census=false, gc_type=0, cap=0x0000000101dc4500, idle_cap=0x0000000000000000) at GC.c:297 frame #4: 0x00000001009730b9 tcc6502`scheduleDoGC(pcap=0x00007ffeefbff770, task=0x0000000110c078a0, force_major=false) at Schedule.c:1809 frame #5: 0x00000001009724d4 tcc6502`schedule(initialCapability=0x0000000101dc4500, task=0x0000000110c078a0) at Schedule.c:558 frame #6: 0x00000001009737d8 tcc6502`scheduleWaitThread(tso=0x0000004200105388, ret=0x0000000000000000, pcap=0x00007ffeefbff870) at Schedule.c:2544 frame #7: 0x000000010096a496 tcc6502`rts_evalLazyIO(cap=0x00007ffeefbff870, p=0x0000000101d74e68, ret=0x0000000000000000) at RtsAPI.c:530 frame #8: 0x000000010096d448 tcc6502`hs_main(argc=7, argv=0x00007ffeefbff9c8, main_closure=0x0000000101d74e68, rts_config=RtsConfig @ 0x00007ffeefbff890) at RtsMain.c:72 frame #9: 0x000000010063eec6 tcc6502`main + 166 frame #10: 0x00007fff692ec015 libdyld.dylib`start + 1 frame #11: 0x00007fff692ec015 libdyld.dylib`start + 1 }}} {{{ frame #0: 0x00000001001105c4 tcc6502`c5kzn_info + 36 tcc6502`c5kzn_info: -> 0x1001105c4 <+36>: leaq 0x888175(%rip), %rdx ; stg_MUT_ARR_PTRS_DIRTY_info 0x1001105cb <+43>: movq %rdx, (%rbx) 0x1001105ce <+46>: leaq 0x18(%rbx), %rdx 0x1001105d2 <+50>: shrq $0x7, %rax Target 0: (tcc6502) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00000001001105c4 tcc6502`c5kzn_info + 36 frame #1: 0x0000000109a19b60 libclang.dylib`llvm::TargetLibraryInfoImpl::StandardNames + 896 frame #2: 0x0000000108a6440b libclang.dylib frame #3: 0x0000000101d3f8c1 tcc6502`TranslatorziTargetziMOS6502ziCompilerziRegister_RCzuY_closure + 1 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): Comment 7 in 14414 says: "Note that the entirety of the anomalous time is in the garbage collector." Is this true in your case? Can you add " +RTS -s" to the end of your run command line and show us the output in the different cases? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hth313): Built without profiling: {{{ 2,608,765,195,440 bytes allocated in the heap 6,315,679,080 bytes copied during GC 80,444,376 bytes maximum residency (62 sample(s)) 484,392 bytes maximum slop 234 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1993101 colls, 0 par 44.791s 46.358s 0.0000s 0.0044s Gen 1 62 colls, 0 par 2.675s 2.983s 0.0481s 0.2165s INIT time 0.000s ( 0.002s elapsed) MUT time 792.519s (798.495s elapsed) GC time 47.466s ( 49.341s elapsed) EXIT time 0.000s ( 0.004s elapsed) Total time 839.985s (847.842s elapsed) %GC time 5.7% (5.8% elapsed) Alloc rate 3,291,739,430 bytes per MUT second Productivity 94.3% of total user, 94.2% of total elapsed real 14m42.573s user 13m59.992s sys 0m6.137s }}} Built with profiling: {{{ 31,046,636,816 bytes allocated in the heap 9,289,234,088 bytes copied during GC 204,185,744 bytes maximum residency (55 sample(s)) 958,320 bytes maximum slop 554 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 29674 colls, 0 par 3.982s 4.041s 0.0001s 0.0055s Gen 1 55 colls, 0 par 3.537s 3.785s 0.0688s 0.2521s INIT time 0.000s ( 0.002s elapsed) MUT time 20.312s ( 20.502s elapsed) GC time 7.518s ( 7.827s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 0.000s ( 0.000s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 27.830s ( 28.330s elapsed) %GC time 27.0% (27.6% elapsed) Alloc rate 1,528,524,987 bytes per MUT second Productivity 73.0% of total user, 72.4% of total elapsed real 0m28.442s user 0m27.839s sys 0m0.461s }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.8.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.6.1 => 8.8.1 Comment: It doesn't look like we will be able to address this in 8.6. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: infoneeded Priority: high | Milestone: 8.8.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * status: new => infoneeded Comment: It's very hard (if not impossible) to fix this without a reproducer. Marking as "infoneeded". -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: infoneeded Priority: high | Milestone: 8.8.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hth313): Would it work if I could provide object files and a command line, so that you can relink it, to make it possible to try different run-times? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: infoneeded Priority: high | Milestone: 8.8.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): To be honest I don't think that'd be helpful. Normally for this I'd start with comparing Core outputs of the program, with and without profiling flags. I think it's very likely that this will be enough to see why profiled version is faster. Btw, looking at the previous comments I realized that we didn't check different optimisation parameters. One of the reasons why sometimes profiling builds are faster is because profiling sometimes inhibits some optimisations (as it changes Core of the program), and optimisations sometimes make some programs slower. Could you try building your program without profiling, and with different optimisation settings? Please report numbers for: - -O0 - -O1 (default when using cabal) - -O2 If you're using cabal add these to `ghc-options` in your .cabal file. Depending on the numbers you can then try to disable individual optimisation passes (see the user manual or man page) etc. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: infoneeded Priority: high | Milestone: 8.8.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hth313): Here are the numbers for different optimization level (no profiling enabled). == `-O0` {{{ 15,092,192,977,712 bytes allocated in the heap 108,592,404,512 bytes copied during GC 186,703,440 bytes maximum residency (1432 sample(s)) 709,040 bytes maximum slop 451 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 13967958 colls, 0 par 301.484s 311.887s 0.0000s 0.0044s Gen 1 1432 colls, 0 par 100.350s 100.731s 0.0703s 0.2282s INIT time 0.000s ( 0.002s elapsed) MUT time 2260.896s (2283.741s elapsed) GC time 401.834s (412.618s elapsed) EXIT time 0.000s ( 0.002s elapsed) Total time 2662.730s (2696.363s elapsed) %GC time 15.1% (15.3% elapsed) Alloc rate 6,675,314,555 bytes per MUT second Productivity 84.9% of total user, 84.7% of total elapsed real 44m56.321s user 44m22.736s sys 0m31.989s }}} == `-O1` {{{ 2,608,631,901,600 bytes allocated in the heap 6,337,722,720 bytes copied during GC 80,513,992 bytes maximum residency (61 sample(s)) 434,080 bytes maximum slop 235 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1993189 colls, 0 par 43.959s 45.610s 0.0000s 0.0021s Gen 1 61 colls, 0 par 2.810s 2.938s 0.0482s 0.1433s INIT time 0.000s ( 0.002s elapsed) MUT time 835.425s (843.204s elapsed) GC time 46.769s ( 48.548s elapsed) EXIT time 0.000s ( 0.001s elapsed) Total time 882.194s (891.755s elapsed) %GC time 5.3% (5.4% elapsed) Alloc rate 3,122,521,195 bytes per MUT second Productivity 94.7% of total user, 94.6% of total elapsed real 14m51.787s user 14m42.200s sys 0m7.372s }}} == `-O2` {{{ 2,608,768,497,856 bytes allocated in the heap 6,313,467,672 bytes copied during GC 80,449,120 bytes maximum residency (62 sample(s)) 487,840 bytes maximum slop 235 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1993104 colls, 0 par 47.643s 49.406s 0.0000s 0.0017s Gen 1 62 colls, 0 par 2.746s 2.880s 0.0465s 0.1499s INIT time 0.000s ( 0.002s elapsed) MUT time 801.664s (809.891s elapsed) GC time 50.389s ( 52.286s elapsed) EXIT time 0.000s ( 0.009s elapsed) Total time 852.053s (862.189s elapsed) %GC time 5.9% (6.1% elapsed) Alloc rate 3,254,192,077 bytes per MUT second Productivity 94.1% of total user, 93.9% of total elapsed real 14m22.223s user 14m12.060s sys 0m7.559s }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: infoneeded Priority: high | Milestone: 8.8.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): OK, so profiling build is faster than any of these .. Becuase we can't see the source you'll have to debug this yourself. Here's what I'd do next: - Add `-ddump-simpl -ddump-to-file -dsuppress-uniques` to `ghc-options` in your .cabal, and build without profiling. Copy generated .dump-simpl files to another directory, and build again this time with profiling (make sure to use same optimisation settings in both!), copy the .dump-simpl files again to another directory. Do directory diff (perhaps using kdiff3) and see the differences. You should see lots of minor changes (like the extra `scc` expression in the profiled version) and those should not matter, focus on larger changes. Looking at results in comment:13, you should see some code in non-profiled version that allocates more. One example where this happens is when GHC unboxes strict arguments/fields but not in the whole program so you still need the boxed version of the value. In that case at some point you re-box the value, causing more allocation. There are other reasons too, hard to list all.. - Looking at the numbers in comment:13, it looks like profiling version has less max residency. Perhaps in non-profiled version some closures are floated to the top-level, causing increased residency. Keep this in mind when comparing Core outputs. Failing all that, try to extract some minimal reproducer from your code base and share it :-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15418: Performance drop 60 times on non-profiling binary
-------------------------------------+-------------------------------------
Reporter: hth313 | Owner: (none)
Type: bug | Status: infoneeded
Priority: high | Milestone: 8.8.1
Component: Runtime System | Version: 8.4.3
Resolution: | Keywords:
Operating System: MacOS X | Architecture: x86_64
Type of failure: Runtime | (amd64)
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #14414, #9599 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Marlow

#15418: Performance drop 60 times on non-profiling binary -------------------------------------+------------------------------------- Reporter: hth313 | Owner: (none) Type: bug | Status: infoneeded Priority: high | Milestone: 8.8.1 Component: Runtime System | Version: 8.4.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #14414, #9599 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): oops, I got the ticket number wrong in this commit, it should be #15481 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15418#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC