
#14208: Performance with O0 is much better than the default or with -O2, runghc performs the best -------------------------------------+------------------------------------- Reporter: harendra | Owner: osa1 Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by harendra): In case you need another data point, my original streaming library that made me file this issue still exhibits the same behavior. GHCi is 6x faster than my regular compiled code. I tried even compiling everything including all dependencies with exactly the same optimization flags to make sure there is no funny business due to mixing of opt flags. You can see the behavior in the tree available on github here: https://github.com/composewell/streamly/tree/199e20dd4b62ac2dafea0a40dc2ce3d... You can clone the repo and run the experiment like this: {{{ $ stack bench benchmarked streaming ops time 34.39 ms (32.99 ms .. 35.67 ms) 0.995 R² (0.991 R² .. 0.998 R²) mean 33.97 ms (33.24 ms .. 35.43 ms) $ stack runghc benchmark/Main.hs benchmarked streaming ops time 6.215 ms (5.684 ms .. 6.860 ms) 0.945 R² (0.896 R² .. 0.978 R²) mean 6.610 ms (6.333 ms .. 6.991 ms) }}} If I change the optimization flags to -O0 for benchmark stanza in cabal file I can get close to ghci performance. The code that I am benchmarking is like this: {{{ {-# INLINE streamlyOp #-} streamlyOp :: IO Int streamlyOp = do xs <- S.toList $ S.serially $ S.each [1..100000 :: Int] & fmap (+1) & fmap (+1) & fmap (+1) & fmap (+1) return (Prelude.length xs) }}} It seems the problem is with the `fmap` operation (I may be wrong), it is 6 times slower in case of GHC, and every other fmap I add, the benchmark timings increase but the ratio remains the same. I tried using an INLINE on fmap, I also tried to SPECIALIZE it to IO and INT type but no change. The `fmap` op is defined in `src/Streamly/Streams.hs` file like this: {{{ instance Monad m => Functor (StreamT m) where fmap f (StreamT (Stream m)) = StreamT $ Stream $ \_ stp yld -> let yield a Nothing = yld (f a) Nothing yield a (Just r) = yld (f a) (Just (getStreamT (fmap f (StreamT r)))) in m Nothing stp yield }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14208#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler