
Would it be worth turning this into a Trac ticket? Simon From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Eyal Lotem Sent: 07 February 2013 02:14 To: ghc-devs@haskell.org Subject: Stable pointers and hash table performance Hey, Background ---------------- I built a hash table in C: https://github.com/Peaker/small_hash According to a few simple benchmarks (mainly, 10 million insertions), my C hash table is much faster than any Haskell data structure I tried (e.g: 8 times faster than IntMap). You can run the C benchmark (in small_hash) via: "make && ./benchmark 10000000". The other benchmarks are at: https://github.com/Peaker/hashtable_benchmarks Benchmarks results on my hardware: http://hpaste.org/81902 I was hoping to get that fast mapping goodness to Haskell, by building an FFI to it: https://github.com/Peaker/small_hash_hs In order to do FFI bindings to a C data structure, that can contain arbitrary Haskell values, I must create a lot of StablePtrs (for each Haskell value stored in the C hash table). Problem 1: ------------- Disappointingly, the FFI bindings were dozens of times slower(!) than the C code. When using +RTS -H200M the speed is about 10 times slower(!) than the C code. oprofile shows that the culprit is the Stable Ptr code. Digging in, I found that stable ptrs and stable names shared a table, despite having quite different characteristics: * Stable Names aren't considered roots by GC, ptrs are. * Making two stable ptrs for the same Haskell value doesn't really require them to unify -- so there's no need for the hash table or the "refs" counter for stable ptrs. My Changes: ----------------- I split the Stable Names into their own table (which allowed removing the "refs" field from them). Stable Ptrs no longer use a hash table to unify, don't need "refs", or "sn_obj", or "old". This made Stable Ptrs cost just 1 C ptr in the table and really cheap to create/destroy. The commits are here: https://github.com/Peaker/ghc/commits/master They are not polished (I didn't thoroughly review myself yet, didn't run the test suite, fix out-of-date comments involved) and not ready for integration yet. Result: --------- The hash table benchmark via the FFI is now only ~2 times slower than the C data structure and much faster than any other mapping structure for that particular benchmark (10 million insertions of identity integers, sequentially). A lot of the cost here is due to fine-grained allocation, as the C code does bulk allocation. I shall improve this benchmark later, and I believe it will tighten the gap some more. Problem 2: -------------- http://hackage.haskell.org/trac/ghc/ticket/7670 Every minor GC iterates a whole lot of stable ptrs. This means that with the default heap size, runtime is about 8 times longer (than with a 200M heap size). To avoid this, I'd like to split the stable ptrs table into generations. To do this, I need to know which of the pointed objects were promoted and to what generation, so I can promote my stable ptrs accordingly. It seems like the correct time to do this is in the updateStablePtrTable (now named "updateStableTables") This is the part I need help with. How do I know if an object was promoted to a higher generation when I traverse the objects? Thanks! Eyal