
Hi, Now when we can efficiently copy small arrays I've gone back to benchmarking/optimizing my hash array mapped trie data structure. The data structure's performance depends on how efficiently we can allocate/collect/copy Array#s and MutableArrays#s of size <=32. Speeding up copying helped quite a bit, but the HAMT is still slower than a Patricia tree for inserts. If you look at the data below you can see that GC time totally dominates the runtime. Is there anything that can be done to make the GC cheaper in this case? If I'm reading the output below right it's not allocating the space that takes most of the time, but later evacuating/scavenging it. Benchmark ========= {-# LANGUAGE BangPatterns #-} module Main (main) where import Data.List import qualified Data.HashMap as HM test :: HM.HashMap Int Int -> Int -> HM.HashMap Int Int test !m 0 = m test m n = test (HM.insert n n m) (n-1) {-# NOINLINE test #-} main = print $ HM.null $ test HM.empty 1000000 Hash array mapped trie ====================== $ time perf record -f ./test False [ perf record: Woken up 13 times to write data ] [ perf record: Captured and wrote 1.672 MB perf.data (~73041 samples) ] real 0m2.562s user 0m0.020s sys 0m0.010s # Samples: 72980 # # Overhead Command Shared Object Symbol # ........ ............... .......................... ...... # 52.79% test ./test [.] evacuate 9.49% test ./test [.] 0x000000000b4544 8.47% test ./test [.] scavenge_block 6.27% test ./test [.] s4UJ_info 4.67% test ./test [.] s1Od_info 4.22% test ./test [.] scavenge_mut_arr_ptrs 3.68% test ./test [.] s4N7_info 0.72% test ./test [.] todo_block_full 0.55% test ./test [.] GarbageCollect 0.55% test ./test [.] freeGroup Patricia tree ============= $ time perf record -f ./test False [ perf record: Woken up 11 times to write data ] [ perf record: Captured and wrote 1.292 MB perf.data (~56470 samples) ] real 0m1.984s user 0m0.020s sys 0m0.010s # Samples: 56409 # # Overhead Command Shared Object Symbol # ........ ............... .................... ...... # 49.30% test ./test [.] evacuate 24.75% test ./test [.] sa4B_info 14.56% test ./test [.] scavenge_block 1.31% test ./test [.] sa4k_info 1.25% test ./test [.] sa4m_info 1.25% test ./test [.] 0x00000000004108 0.72% test [kernel] [k] clear_page_c 0.66% test ./test [.] todo_block_full 0.46% test [kernel] [k] page_fault 0.44% test ./test [.] GarbageCollect P.S. I also need to figure out what 0x000000000b4544 corresponds to in the first perf events report. I thought my fix to the generated assembly files should have taken care of the unknown symbol problem. Cheers, Johan