
Bulat Ziganshin wrote:
Hello Bertram,
Sunday, October 19, 2008, 6:19:31 AM, you wrote:
That's 5 words per elements
... that, like everything else, should be multiplied by 2-3 to account GC effect
True. You can control this factor though. Two RTS options help: -c (Enable compaction for all major collections) - mostly avoids fragmentation in the old generation. -F<factor> (Control the amount of memory reserved in terms of the size of the oldest generation. The default is 2, meaning that if the oldest generation is 200MB in size, 400 MB of heap will be used) Consider this program,
module Main (main) where
import qualified Data.IntMap as M import Data.List (foldl')
main = do loop (M.fromList [(i,0) | i <- [1..5000000]]) 1
loop dict j = do i <- readLn print $ dict M.! (i :: Int) let dict' = foldl' (\m (k, v) -> M.insert k v m) dict [(,) i $! j*i | i <- [j`mod`10 * 500000 + 1..j`mod`10 * 500000 + 500000]] loop dict' (j+1)
This program maintains an IntMap with 5 million entries, which means 200 MB of live data on a 32 bit computer. It updates the map a lot, too, so I think this is a fairly realistic example. Running it on a 51 line input with various RTS options [*], we get: Options Memory used Time used +RTS -c -F1.1 220 MB 3m22s +RTS -c -F1.2 243 MB 2m12s +RTS -c -F1.5 306 MB 1m58s +RTS -c 398 MB 1m57s +RTS -F 1.1 406 MB 1m43s +RTS -F 1.2 425 MB 1m15s +RTS -F 1.5 483 MB 1m6s none 580 MB 1m11s Heap residency was around 200.5 million bytes in all runs. As expected, saving memory this way doesn't come cheap - it can dramatically increase the program's runtime. But if a program builds and slowly updates a large dictionary, playing with these options can help a lot. Bertram [*] time (seq 50; echo 0) | ./Main +RTS -sstderr -c -F1.2