#15176: Superclass `Monad m =>` makes program run 100 times slower -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I be able to get more useful RTS stats and smaller Core I made a smaller reproducer: {{{#!haskell {-# LANGUAGE TypeApplications #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE BangPatterns #-} {-# LANGUAGE ConstraintKinds #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE Strict #-} module Main where import Prologue import qualified Control.Monad.Exception as Exception import qualified Data.Graph.Data.Graph.Class as Graph import qualified Data.Graph.Fold.Partition as Partition import qualified Luna.IR as IR import qualified Luna.Pass as Pass import qualified Luna.Pass.Scheduler as Scheduler import Luna.Pass (Pass) import Luna.Pass.Basic (Compilation) type OnDemandPass stage pass m = ( MonadIO m , Typeable pass , Pass.Compile stage pass m , Exception.MonadException Scheduler.Error m ) runPass :: forall stage pass m . OnDemandPass stage pass m => Pass stage pass () -> m () runPass !pass = Scheduler.evalT $ do Scheduler.registerPassFromFunction__ pass Scheduler.runPassSameThreadByType @pass {-# INLINE runPass #-} runPass' :: Pass Compilation Pass.BasicPass () -> IO () runPass' p = Graph.encodeAndEval @Compilation (runPass p) {-# INLINE runPass' #-} partitionsUnify :: Int -> IO () partitionsUnify i = runPass' $ do !a <- IR.var "a" !b <- IR.var "b" !u <- IR.unify a b let go !0 = let !o = pure () in o go !j = do !_ <- Partition.partition u go $! j - 1 go i main :: IO () main = partitionsUnify (10^6) }}} Put this in core/test/Main.hs and add this to luna-core.cabal: {{{ executable bench-test main-is: Main.hs hs-source-dirs: test/ build-depends: ansi-terminal -any, base -any, containers -any, convert -any, deepseq -any, ghc -any, layered-state -any, luna-autovector -any, luna-core -any, luna-cpp-containers -any, luna-data-storable -any, luna-data-typemap -any, luna-exception -any, luna-foreign-utils -any, luna-generic-traversable -any, luna-generic-traversable2 -any, luna-memory-manager -any, luna-memory-pool -any, luna-tuple-utils -any, mtl -any, primitive -any, prologue -any, structs -any, unboxed-ref >=0.4.0.0, vector -any ghc-options: -O2 -ticky -rtsopts -Wall }}} Results: (with and without `Monad =>`) {{{ ============= With Monad => ======================================================= luna git:(master) $ time (cabal-run bench-test +RTS -s) 27,544,258,632 bytes allocated in the heap 19,561,928 bytes copied during GC 205,496 bytes maximum residency (2 sample(s)) 33,152 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 26366 colls, 0 par 0.270s 0.268s 0.0000s 0.0008s Gen 1 2 colls, 0 par 0.002s 0.002s 0.0008s 0.0011s INIT time 0.000s ( 0.000s elapsed) MUT time 13.449s ( 13.487s elapsed) GC time 0.272s ( 0.269s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 13.721s ( 13.757s elapsed) %GC time 2.0% (2.0% elapsed) Alloc rate 2,048,118,786 bytes per MUT second Productivity 98.0% of total user, 98.0% of total elapsed ( cabal-run bench-test +RTS -s; ) 13,72s user 0,04s system 99% cpu 13,761 total ============= Original ============================================================ luna git:(master) $ time (cabal-run bench-test +RTS -s) 3,952,215,688 bytes allocated in the heap 2,071,824 bytes copied during GC 200,320 bytes maximum residency (2 sample(s)) 33,152 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 3790 colls, 0 par 0.042s 0.043s 0.0000s 0.0008s Gen 1 2 colls, 0 par 0.001s 0.002s 0.0009s 0.0010s INIT time 0.000s ( 0.000s elapsed) MUT time 1.595s ( 1.605s elapsed) GC time 0.043s ( 0.044s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 1.638s ( 1.650s elapsed) %GC time 2.6% (2.7% elapsed) Alloc rate 2,478,513,730 bytes per MUT second Productivity 97.4% of total user, 97.3% of total elapsed ( cabal-run bench-test +RTS -s; ) 1,64s user 0,01s system 99% cpu 1,654 total }}} I'll now try with `-ticky`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15176#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler