[GHC] #11116: GC reports memory in use way below the actual

#11116: GC reports memory in use way below the actual -------------------------------------+------------------------------------- Reporter: | Owner: facundo.dominguez | Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- The following program encodes and decodes a long list of words. The memory in use reported by the GC seems to be off by multiple gigabytes when compared to the reports of the OS. Results shown below. ghc-7.10.2, binary-0.7.6.1. {{{ #!haskell import Control.Exception (evaluate) import Control.Monad (void) import Data.Binary (encode, decode) import qualified Data.ByteString.Lazy as BSL import Data.List (isPrefixOf, foldl') import Data.Word (Word32) import GHC.Stats import System.Mem (performGC) type T = (Word32,[Word32]) main :: IO () main = do let sz = 1024 * 1024 * 15 xs = [ (i,[i]) :: T | i <- [0 .. sz] ] bs = encode xs void $ evaluate $ sum' $ map (\(x, vs) -> x + sum' vs) xs putStrLn "After building the value to encode:" printMem putStrLn $ "Size of the encoded value: " ++ show (BSL.length bs `div` (1024 * 1024)) ++ " MB" putStrLn "" putStrLn "After encoding the value:" printMem let xs' = decode bs :: [T] void $ evaluate $ sum' $ map (\(x, vs) -> x + sum' vs) xs' putStrLn "After decoding the value:" printMem -- retain the original list so it is not GC'ed void $ evaluate $ last xs -- retain the decoded list so it is not GC'ed void $ evaluate $ last xs' printMem :: IO () printMem = do performGC readFile "/proc/self/status" >>= putStr . unlines . filter (\x -> any (`isPrefixOf` x) ["VmHWM", "VmRSS"]) . lines stats <- getGCStats putStrLn $ "In use according to GC stats: " ++ show (currentBytesUsed stats `div` (1024 * 1024)) ++ " MB" putStrLn $ "HWM according the GC stats: " ++ show (maxBytesUsed stats `div` (1024 * 1024)) ++ " MB" putStrLn "" sum' :: Num a => [a] -> a sum' = foldl' (+) 0 }}} Here are the results: {{{ # ghc --make -O -fno-cse -fforce-recomp -rtsopts test.hs # time ./test +RTS -T After building the value to encode: VmHWM: 2782700 kB VmRSS: 2782700 kB In use according to GC stats: 1320 MB HWM according the GC stats: 1320 MB Size of the encoded value: 240 MB After encoding the value: VmHWM: 3064976 kB VmRSS: 3064976 kB In use according to GC stats: 1560 MB HWM according the GC stats: 1560 MB After decoding the value: VmHWM: 7426784 kB VmRSS: 7426784 kB In use according to GC stats: 2880 MB HWM according the GC stats: 2880 MB real 0m24.348s user 0m22.316s sys 0m1.992s }}} At the end of the program the OS reports 7 GB while the GC reports less than 3G of memory in use. Running the program with {{{+RTS -M3G}}} keeps VmHWM bounded at the expense of doubling the execution time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11116 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11116: GC reports memory in use way below the actual -------------------------------------+------------------------------------- Reporter: | Owner: facundo.dominguez | Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.2 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by facundo.dominguez): * cc: simonmar (added) * failure: None/Unknown => Runtime performance bug * os: Unknown/Multiple => Linux * component: Compiler => Runtime System * architecture: Unknown/Multiple => x86_64 (amd64) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11116#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11116: GC reports memory in use way below the actual -------------------------------------+------------------------------------- Reporter: | Owner: facundo.dominguez | Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 7.10.2 Resolution: invalid | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by rwbarton): * status: new => closed * resolution: => invalid Comment: `currentBytesUsed` and `maxBytesUsed` are, as documented, "Current number of live bytes" on the heap and "Maximum number of live bytes seen so far" respectively. They are just calculated as the sum of the sizes of all live objects on the heap. Due to the way GHC's copying garbage collector works, the actual space used by the heap will typically be double this size. Then of course there will be additional space used by the runtime system or other C libraries (though that is not significant in this example). `peakMegabytesAllocated` counts everything allocated through the RTS (including any blocks used for heap) and will be closer to the figure you are looking for. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11116#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11116: GC reports memory in use way below the actual -------------------------------------+------------------------------------- Reporter: | Owner: facundo.dominguez | Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 7.10.2 Resolution: invalid | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Also note that the RTS only tracks memory that is allocated on the Haskell heap, it doesn't track memory allocated by C libraries, `malloc`, or `mmap`. So there are several reasons why the memory figure reported by `peakMegabytesAllocated` might be less than the RSS figure from the OS. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11116#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11116: GC reports memory in use way below the actual -------------------------------------+------------------------------------- Reporter: | Owner: facundo.dominguez | Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 7.10.2 Resolution: invalid | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by facundo.dominguez): I was a bit surprised of having the application use only 40% of the space for live data, but then I know very little of how the GC is supposed to work. Thanks for taking a look. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11116#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC