These times are measured in milliseconds. Conclusion: my implementation is massively faster, by a factor of at least 4.5x. (I spent the last half hour trying to optimize SBinomial -- it really is that big a difference, and it's not going to change.)
Here's why I think that's the case: even though we might have the Skip constructor, how many Skip constructors are there, total? On average, half the forest is made up of Skips, so there are 1/2 log n Skip values there.
But the thing is that the sparse binomial tree representation can't do anywhere near as much unpacking; in particular, the set of children of each node is not a single-constructor type. That's an O(n) increase in allocation, all for an O(log n) shortening of the spine. That's why it's a bad plan. Most of the work is being done in allocations of nodes in the *trees,* rather than along the spine among the roots. In this area, the crazy, type-system-exploiting approach actually does much less allocation, because it can do a lot more constructor unpacking. Let's test this hypothesis:
3,050,272,052 bytes allocated in the heap
240,340,552 bytes copied during GC
1,087,992 bytes maximum residency (201 sample(s))
53,136 bytes maximum slop
3 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 5703 collections, 0 parallel, 0.48s, 0.53s elapsed
Generation 1: 201 collections, 0 parallel, 0.22s, 0.24s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.52s ( 4.74s elapsed)
GC time 0.70s ( 0.77s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 5.23s ( 5.51s elapsed)
%GC time 13.5% (13.9% elapsed)
Alloc rate 674,200,547 bytes per MUT second
Productivity 86.5% of total user, 82.2% of total elapsed
The sparse binomial forest representation, same options:
5,612,965,672 bytes allocated in the heap
563,671,500 bytes copied during GC
1,967,576 bytes maximum residency (202 sample(s))
107,212 bytes maximum slop
5 MB total memory in use (1 MB lost due to fragmentation)
Generation 0: 10602 collections, 0 parallel, 1.60s, 1.67s elapsed
Generation 1: 202 collections, 0 parallel, 0.28s, 0.30s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 6.52s ( 6.51s elapsed)
GC time 1.87s ( 1.97s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 8.40s ( 8.48s elapsed)
%GC time 22.3% (23.2% elapsed)
Alloc rate 860,302,018 bytes per MUT second
Productivity 77.7% of total user, 76.9% of total elapsed
Furthermore, if we actually profile the sparse forest representation, we confirm that a huge amount of allocation is being done during tree melding.
Conclusion: the type-magical binomial heap implementation does a lot less allocating, and is faster even ignoring GC time. I encourage you to test it out for yourself, but I think you'll find similar results. =D
Louis