[GHC] #16034: Quadratic GC slowdown using RTS debug

#16034: Quadratic GC slowdown using RTS debug -------------------------------------+------------------------------------- Reporter: remyo | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime | Version: 8.6.3 System | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Hello, Under some conditions of heavy heap usage, I seem to experience a quadratic slowdown due to GC using the debugging RTS only (for example, in the context of ticky profiling). I have been able to reproduce it using the following program (depending on the aeson library): {{{#!hs import Data.Aeson (eitherDecode, Value) import qualified Data.ByteString.Lazy as BL import Data.Maybe (catMaybes) import System.Directory (listDirectory) import System.FilePath ( (>) ) main :: IO() main = do let dir = "data" files <- listDirectory dir values <- catMaybes <$> mapM (readF dir) files print $ length values readF :: FilePath -> FilePath -> IO (Maybe Value) readF dir fp = do print (dir > fp) blob <- BL.readFile (dir > fp) case eitherDecode blob of Left _ -> return Nothing Right v -> return $ Just $! v }}} Here data is a directory filled with (identical) JSON files. See the attached Python script for the proposed dataset. {{{ $ ghc -O -dynamic -o main Main.hs $ ./main +RTS -s ... 18,583,045,600 bytes allocated in the heap 6,334,338,280 bytes copied during GC 975,763,072 bytes maximum residency (14 sample(s)) 6,986,112 bytes maximum slop 930 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 17727 colls, 0 par 2.596s 2.597s 0.0001s 0.0006s Gen 1 14 colls, 0 par 2.422s 2.422s 0.1730s 0.7947s INIT time 0.000s ( 0.000s elapsed) MUT time 3.349s ( 3.351s elapsed) GC time 5.019s ( 5.020s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 8.368s ( 8.371s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 5,548,968,539 bytes per MUT second Productivity 40.0% of total user, 40.0% of total elapsed }}} {{{ $ ghc -O -dynamic -debug -o main.debug Main.hs $ ./main +RTS -s ... 18,583,045,600 bytes allocated in the heap 6,334,332,424 bytes copied during GC 975,763,072 bytes maximum residency (14 sample(s)) 6,986,112 bytes maximum slop 930 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 17727 colls, 0 par 172.941s 172.959s 0.0098s 0.0459s Gen 1 14 colls, 0 par 9.532s 9.532s 0.6808s 3.2061s INIT time 0.000s ( 0.000s elapsed) MUT time 5.020s ( 5.017s elapsed) GC time 182.473s (182.491s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 187.492s (187.508s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 3,701,999,827 bytes per MUT second Productivity 2.7% of total user, 2.7% of total elapsed }}} In debug mode, the runtime is almost quadratic: {{{ Nb of files Duration (non-debug) Duration (debug) 100 0.8s 2.8s 200 1.6s 8.8s 400 3.2s 27.6s 1000 8.2s 168.9s }}} The problem can be seen with either GHC 8.6.3 (here on Archlinux, with aeson 1.4.2.0) but was also observed using GHC 8.4.4, with static binaries, using "stack". I was able to reproduce with both threaded and non-threaded mode. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16034 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16034: Quadratic GC slowdown using RTS debug -------------------------------------+------------------------------------- Reporter: remyo | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by remyo): * Attachment "generate.py" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16034 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16034: Quadratic GC slowdown using RTS debug -------------------------------------+------------------------------------- Reporter: remyo | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by remyo): Basic debugging seems to indicate the time is mainly spent in {{{scheduleDoGC}}} and especially {{{countBlocks}}}. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16034#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16034: Quadratic GC slowdown using RTS debug -------------------------------------+------------------------------------- Reporter: remyo | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I don't think this is a bug. As you already found out we do `countBlock()` a lot in debug runtime for sanity checking (`perf` shows that your example spends 72% of the time in `countBlocks()`), which traverses linked lists of blocks. This takes more time as residency increases. Out of curiosity, why do you want debug runtime to be faster? Debug runtime is supposed to make debugging easier, and for that we do lots of sanity checking. It'd be strange to optimise debug runtime for runtime performance instead of ease of debugging. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16034#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16034: Quadratic GC slowdown using RTS debug -------------------------------------+------------------------------------- Reporter: remyo | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by remyo): The original intent was not to explicitly use debugging runtime but ticky profiling (which implies debugging runtime). It is of course natural to expect some slowdown (2x, 3x, 10x or even more). But in the real world example I was testing, the program was still not finished after more than 10 hours, where the regular compiled binary finished in 4 minutes. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16034#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC