
#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I attached four files for ticky and prof outputs of the program in comment:10 with and wihtout `Monad m =>` patch. I can't make sense of the ticky output -- it's really hard to see what's wrong in a hundred lines long Core function but perhaps someone else can figure it out. One other thing I tried was to test the patch with `-O0`, and the numbers are almost identical: {{{ === ORIGINAL =================================================================== luna git:(master) $ time (cabal-run bench-test +RTS -s) 77,264,754,848 bytes allocated in the heap 114,241,080 bytes copied during GC 240,688 bytes maximum residency (2 sample(s)) 33,152 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 74218 colls, 0 par 0.256s 0.246s 0.0000s 0.0002s Gen 1 2 colls, 0 par 0.001s 0.001s 0.0003s 0.0006s INIT time 0.000s ( 0.000s elapsed) MUT time 25.042s ( 25.168s elapsed) GC time 0.257s ( 0.247s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 25.298s ( 25.415s elapsed) %GC time 1.0% (1.0% elapsed) Alloc rate 3,085,464,250 bytes per MUT second Productivity 99.0% of total user, 99.0% of total elapsed ( cabal-run bench-test +RTS -s; ) 25,30s user 0,12s system 99% cpu 25,423 total === PATCHED =================================================================== luna git:(master) $ time (cabal-run bench-test +RTS -s) 77,200,755,440 bytes allocated in the heap 114,115,976 bytes copied during GC 241,064 bytes maximum residency (2 sample(s)) 33,152 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 74218 colls, 0 par 0.263s 0.254s 0.0000s 0.0002s Gen 1 2 colls, 0 par 0.000s 0.001s 0.0003s 0.0006s INIT time 0.000s ( 0.000s elapsed) MUT time 25.487s ( 25.573s elapsed) GC time 0.263s ( 0.254s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 25.750s ( 25.827s elapsed) %GC time 1.0% (1.0% elapsed) Alloc rate 3,029,012,929 bytes per MUT second Productivity 99.0% of total user, 99.0% of total elapsed ( cabal-run bench-test +RTS -s; ) 25,75s user 0,08s system 100% cpu 25,831 total }}} So it seems to me that with the different dictionary representation we're losing some optimization opportunities. I guess we could try to enable all optimizations again (with -O2) and selectively disable single optimization passes to see which one makes these two versions more similar. That may give an idea about which optimization is not applicable with the `Monad m =>` patch. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler