Reducing the cost of garbage collecting small arrays

22 Jun 2011

      Hi,

Now when we can efficiently copy small arrays I've gone back to
benchmarking/optimizing my hash array mapped trie data structure. The
data structure's performance depends on how efficiently we can
allocate/collect/copy Array#s and MutableArrays#s of size <=32.
Speeding up copying helped quite a bit, but the HAMT is still slower
than a Patricia tree for inserts. If you look at the data below you
can see that GC time totally dominates the runtime. Is there anything
that can be done to make the GC cheaper in this case? If I'm reading
the output below right it's not allocating the space that takes most
of the time, but later evacuating/scavenging it.

Benchmark
=========

{-# LANGUAGE BangPatterns #-}
module Main (main) where

import Data.List
import qualified Data.HashMap as HM

test :: HM.HashMap Int Int -> Int -> HM.HashMap Int Int
test !m 0 = m
test m n = test (HM.insert n n m) (n-1)
{-# NOINLINE test #-}

main = print $ HM.null $ test HM.empty 1000000

Hash array mapped trie
======================

$ time perf record -f ./test
False
[ perf record: Woken up 13 times to write data ]
[ perf record: Captured and wrote 1.672 MB perf.data (~73041 samples) ]

real	0m2.562s
user	0m0.020s
sys	0m0.010s

# Samples: 72980
#
# Overhead          Command               Shared Object  Symbol
# ........  ...............  ..........................  ......
#
    52.79%             test  ./test                      [.] evacuate
     9.49%             test  ./test                      [.] 0x000000000b4544
     8.47%             test  ./test                      [.] scavenge_block
     6.27%             test  ./test                      [.] s4UJ_info
     4.67%             test  ./test                      [.] s1Od_info
     4.22%             test  ./test                      [.]
scavenge_mut_arr_ptrs
     3.68%             test  ./test                      [.] s4N7_info
     0.72%             test  ./test                      [.] todo_block_full
     0.55%             test  ./test                      [.] GarbageCollect
     0.55%             test  ./test                      [.] freeGroup

Patricia tree
=============

$ time perf record -f ./test
False
[ perf record: Woken up 11 times to write data ]
[ perf record: Captured and wrote 1.292 MB perf.data (~56470 samples) ]

real	0m1.984s
user	0m0.020s
sys	0m0.010s

# Samples: 56409
#
# Overhead          Command         Shared Object  Symbol
# ........  ...............  ....................  ......
#
    49.30%             test  ./test                [.] evacuate
    24.75%             test  ./test                [.] sa4B_info
    14.56%             test  ./test                [.] scavenge_block
     1.31%             test  ./test                [.] sa4k_info
     1.25%             test  ./test                [.] sa4m_info
     1.25%             test  ./test                [.] 0x00000000004108
     0.72%             test  [kernel]              [k] clear_page_c
     0.66%             test  ./test                [.] todo_block_full
     0.46%             test  [kernel]              [k] page_fault
     0.44%             test  ./test                [.] GarbageCollect

P.S. I also need to figure out what 0x000000000b4544 corresponds to in
the first perf events report. I thought my fix to the generated
assembly files should have taken care of the unknown symbol problem.

Cheers,
Johan

Johan Tibell

Jan-Willem Maessen

Johan Tibell

Johan Tibell

Jan-Willem Maessen

Johan Tibell

Simon Marlow

tags

participants (3)