These times are measured in milliseconds.  Conclusion: my implementation is massively faster, by a factor of at least 4.5x.  (I spent the last half hour trying to optimize SBinomial -- it really is that big a difference, and it's not going to change.)
Here's why I think that's the case: even though we might have the Skip constructor, how many Skip constructors are there, total?  On average, half the forest is made up of Skips, so there are 1/2 log n Skip values there.
But the thing is that the sparse binomial tree representation can't do anywhere near as much unpacking; in particular, the set of children of each node is not a single-constructor type.  That's an O(n) increase in allocation, all for an O(log n) shortening of the spine.  That's why it's a bad plan.  Most of the work is being done in allocations of nodes in the *trees,* rather than along the spine among the roots.  In this area, the crazy, type-system-exploiting approach actually does much less allocation, because it can do a lot more constructor unpacking.  Let's test this hypothesis:
 3,050,272,052 bytes allocated in the heap
     240,340,552 bytes copied during GC     
       1,087,992 bytes maximum residency (201 sample(s))
          53,136 bytes maximum slop                     
               3 MB total memory in use (0 MB lost due to fragmentation)
  Generation 0:  5703 collections,     0 parallel,  0.48s,  0.53s elapsed
  Generation 1:   201 collections,     0 parallel,  0.22s,  0.24s elapsed
  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    4.52s  (  4.74s elapsed)
  GC    time    0.70s  (  0.77s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    5.23s  (  5.51s elapsed)
  %GC time      13.5%  (13.9% elapsed)
  Alloc rate    674,200,547 bytes per MUT second
  Productivity  86.5% of total user, 82.2% of total elapsed
The sparse binomial forest representation, same options:
   5,612,965,672 bytes allocated in the heap
     563,671,500 bytes copied during GC     
       1,967,576 bytes maximum residency (202 sample(s))
         107,212 bytes maximum slop                     
               5 MB total memory in use (1 MB lost due to fragmentation)
  Generation 0: 10602 collections,     0 parallel,  1.60s,  1.67s elapsed
  Generation 1:   202 collections,     0 parallel,  0.28s,  0.30s elapsed
  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    6.52s  (  6.51s elapsed)
  GC    time    1.87s  (  1.97s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    8.40s  (  8.48s elapsed)
  %GC time      22.3%  (23.2% elapsed)
  Alloc rate    860,302,018 bytes per MUT second
  Productivity  77.7% of total user, 76.9% of total elapsed
Furthermore, if we actually profile the sparse forest representation, we confirm that a huge amount of allocation is being done during tree melding.
Conclusion: the type-magical binomial heap implementation does a lot less allocating, and is faster even ignoring GC time.  I encourage you to test it out for yourself, but I think you'll find similar results.  =D
Louis