Hello again,

I'm still trying to find some method to predict memory usage and came upon this page: https://wiki.haskell.org/GHC/Memory_Footprint

I tried to manually calculate how much memory will the record below consume (64-bit, 1W = 8B):

newtype Id   = Id    Text deriving (Generic, Data, NFData) -- = 6W
newtype Uid  = Uid   Int  deriving (Generic, Data, NFData) -- = 2W
newtype XUid = XUid  Uid  deriving (Generic, Data, NFData) -- = 2W
newtype YUid = YUid  Uid  deriving (Generic, Data, NFData) -- = 2W
data    Dir  = Yes | No   deriving (Generic, Data)         -- = 2W

data X = X
  { a :: XUid                -- =  2W       -- Int
  , b :: YUid                -- =  2W       -- Int
  , c :: Id                  -- =  6W + 8B  -- Text len 4
  , d :: Either Dir Dir      -- =  1W + 2W  -- Either + Dir + No/Yes
  , e :: Text                -- =  6W + 8B  -- Text len 4
  } deriving (Generic, Data) -- = 19W + 16B = 152 + 16 = 168 B

and calculated the assumed sizes of few lists with different number of elements:

Expected list sizes    ([v] = (1 + 3N) words + N * sizeof(v))
  30: 1 + 3W *   30 + (19W *   30 + 16 *   30) =   5,761 B
 600: 1 + 3W *  600 + (19W *  600 + 16 *  600) = 115,201 B
5000: 1 + 3W * 5000 + (19W * 5000 + 16 * 5000) = 960,001 B

I also compared these sizes with three libs (Data.Generics.Schemes.gsize, GHC.DataSize.recursiveSizeNF, Weigh) and the results were:

#items recursiveSizeNF    gSize      Weigh    Expected  Diff/recursiveSizeNF
   1:            1,416       18        696         168          -
  30:            8,008      931     20,880       5,761        28%
 600:          135,688   18,601    417,600     115,201        15%
5000:        1,121,288  155,001  3,480,000     960,001        14%

As you can see, the results are more than surprising (to me), with recursiveSizeNF coming closest. They all measure the same variable.


What am I missing?


For completeness, here are relevant parts of code for creating elements (with excessive forcing):

  let mkX i = force X{ a = XUid $ Uid i
                     , b = YUid $ Uid i
                     , c = Id $ tshow i
                     , d = if even i then (Left Yes) else (Right No)
                     , e = T.reverse (tshow i)
                     }
      xs30  = force . map mkX $ take   30 $ randomRs (1000,1030) (mkStdGen 0)
      xs600 = force . map mkX $ take  600 $ randomRs (1000,1600) (mkStdGen 0)
      xs5K  = force . map mkX $ take 5000 $ randomRs (1000,5000) (mkStdGen 0)

  dataSize <- recursiveSizeNF $!! {a}
  let gSize = gsize $!! mkX 0



-------- Original Message --------
Subject: Measuring memory usage
From: Vlatko Basic <vlatko.basic@gmail.com>
To: haskell-cafe <Haskell-cafe@Haskell.org>
Date: 29/06/18 14:31


Hello,

I've come to some strange results using Weigh package.

It shows that HashMap inside 'data' is using much, much more memory.

The strange thing is that I'm seeing too large mem usage in my app as well (several "MapData" like in records), and trying to figure out with 'weigh' what's keeping the mem.

Noticed that when I change the code to use HashMap directly (not inside 'data', that's the only change), the mem usage observed with top drops down for ~60M, from 850M to 790M.


These are the test results for 10K, 5K and 3.3K items for "data MapData k v = MapData (HashMap k v)" (at the end is the full runnable example.)

Case           Allocated  GCs
HashMap          262,824    0
HashMap half      58,536    0
HashMap third     17,064    0
MapData        4,242,208    4

I tested by changing the order, disabling all but one etc., and the results were the same. Same 'weigh' behaviour with IntMap and Map.


So, if anyone knows and has some experience with such issues, my questions are:

1. Is 'weigh' package reliable/usable, at least to some extent? (the results do show diff between full, half and third)

2. How do you measure mem consumptions of your large data/records?

3. If the results are even approximately valid, what could cause such large discrepancies with 'data'?

4. Is there a way to see if some record has been freed from memory, GCed?



module Main where

import Prelude

import Control.DeepSeq     (NFData)
import Data.HashMap.Strict (HashMap, fromList)
import GHC.Generics        (Generic)
import Weigh               (mainWith, value)


data MapData k v = MapData (HashMap k v) deriving Generic
instance (NFData k, NFData v) => NFData (MapData k v)

full, half, third :: Int
full  = 10000
half  =  5000
third =  3333

main :: IO ()
main = mainWith $ do
  value "HashMap"       (          mkHMList full)
  value "HashMap half"  (          mkHMList half)
  value "HashMap third" (          mkHMList third)
  value "MapData"       (MapData $ mkHMList full)

mkHMList :: Int -> HashMap Int String
mkHMList n = fromList . zip [1..n] $ replicate n "some text"