
#12893: Profiling defeats stream fusion when using vector library -------------------------------------+------------------------------------- Reporter: newhoggy | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: stream-fusion | profiling Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by newhoggy): Here is the an example of how I reproduce the problem assuming the vector library is already installed: {{{ (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw- tutorial-performance-rwhe [master|✚ 1…1] 10:46 $ cat Main.hs module Main (main) where import System.Environment import Text.Printf import qualified Data.Vector.Unboxed as DVU main :: IO () main = do [d] <- map read `fmap` getArgs printf "%f\n" (mean (DVU.enumFromTo 1 d)) mean :: DVU.Vector Double -> Double mean xs = s / fromIntegral (n :: Int) where Pair n s = DVU.foldl k (Pair 0 0) xs k (Pair m t) x = Pair (m + 1) (t + x) {-# INLINE mean #-} data Pair = Pair !Int !Double (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw- tutorial-performance-rwhe [master|✚ 1…1] 10:47 $ ghc -prof -rtsopts -fno-prof-count-entries -fno-prof-cafs -fno- prof-auto -fforce-recomp -O2 -funbox-strict-fields Main.hs [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking Main ... (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw- tutorial-performance-rwhe [master|✚ 1…1] 10:47 $ time ./Main +RTS -sstderr -RTS 1e7 5000000.5 3,012,099,000 bytes allocated in the heap 4,079,698,408 bytes copied during GC 1,162,199,352 bytes maximum residency (13 sample(s)) 19,313,368 bytes maximum slop 1990 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 4869 colls, 0 par 1.112s 1.123s 0.0002s 0.0017s Gen 1 13 colls, 0 par 1.333s 2.194s 0.1688s 0.7226s INIT time 0.000s ( 0.003s elapsed) MUT time 1.027s ( 0.880s elapsed) GC time 2.445s ( 3.317s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 0.000s ( 0.000s elapsed) EXIT time 0.009s ( 0.195s elapsed) Total time 3.483s ( 4.395s elapsed) %GC time 70.2% (75.5% elapsed) Alloc rate 2,932,567,762 bytes per MUT second Productivity 29.8% of total user, 23.6% of total elapsed real 0m4.565s user 0m3.484s sys 0m0.939s (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw- tutorial-performance-rwhe [master|✚ 1…1] 10:47 $ ghc -rtsopts -fno-prof-count-entries -fno-prof-cafs -fno-prof- auto -fforce-recomp -O2 -funbox-strict-fields Main.hs [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking Main ... (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw- tutorial-performance-rwhe [master|✚ 1…1] 10:47 $ time ./Main +RTS -sstderr -RTS 1e7 5000000.5 115,888 bytes allocated in the heap 3,480 bytes copied during GC 44,384 bytes maximum residency (1 sample(s)) 17,056 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s Gen 1 1 colls, 0 par 0.000s 0.005s 0.0050s 0.0050s INIT time 0.000s ( 0.002s elapsed) MUT time 0.010s ( 0.026s elapsed) GC time 0.000s ( 0.005s elapsed) EXIT time 0.000s ( 0.005s elapsed) Total time 0.012s ( 0.039s elapsed) %GC time 1.4% (13.0% elapsed) Alloc rate 11,679,903 bytes per MUT second Productivity 97.5% of total user, 30.5% of total elapsed real 0m0.057s user 0m0.012s sys 0m0.008s }}} The first run with profiling is slow and uses a lot of memory. The second run without profiling is very fast and uses a negligible amount of memory. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12893#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler