
Dear list, I had to implement a red-black tree in C++. So I took Okasaki's functional pearl and ported the algorithm to C++, where the rebalancing is done by following parent pointers. I shew the Haskell algorithm to colleagues, and after admitting it is impressive, they told me whatever the beauty as long as we have performance. So I benchmarked it and the C++ implementation, insertion of one million elements, and the C++ version is 15 times faster. Mainly because in C++ it is just moving pointers around I guess. I have to mention that the space usage is outrageous with Haskell. The Tree is strict in its subtrees, but I think the problem comes from the laziness of the balance function, which needs to hold the whole tree. This reduces to my main question: how can I decompose balance in a stricter version? Or maybe I'm completely wrong and there is something obvious I am missing? If you want to take a look, the code is here: http://hpaste.org/65611 Adrien PS: compiler options and profiling stderr output
ghc -Wall -O2 -prof -auto-all -rtsopts rbt.hs
rbt.exe +RTS -K20M -p -sstderr -RTS 1000000 964,079,612 bytes allocated in the heap 230,842,072 bytes copied during GC 58,442,812 bytes maximum residency (7 sample(s)) 327,812 bytes maximum slop 105 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause Gen 0 1810 colls, 0 par 0.30s 0.27s 0.0001s 0.0034s Gen 1 7 colls, 0 par 0.19s 0.19s 0.0278s 0.0938s INIT time 0.00s ( 0.00s elapsed) MUT time 1.78s ( 1.82s elapsed) GC time 0.48s ( 0.46s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 2.28s ( 2.28s elapsed) %GC time 21.2% (20.3% elapsed) Alloc rate 537,387,643 bytes per MUT second Productivity 78.8% of total user, 78.5% of total elapsed -- Adrien Haxaire www.adrienhaxaire.org | @adrienhaxaire