
I encourage you to test it out for yourself, but I think you'll find similar results. =D
I don't spend a lot of time thinking about non-asymptotic optimizations, so I may have made an error or ten, but I found vastly different results for heap sort. I tested heap sorting mostly following the description of your example: 1. Generate 25,000 random Ints 2. System.Mem.performGC 3. Start timer 4. Insert Ints into PQ 5. Remove one at a time until there are none left 6. Print the last one removed 7. Stop timer I compiled with GHC 6.10.1 and -O2. I tested four implementations of binomial PQs: A: Brodal-Okasaki binomial PQs, modified for O(1) findMin (the first implementation in the paper I linked, transcribed by me by hand). This file contains no strictness or unpacking annotations or optimizations. B: Wasserman PQs, using your unpacking but writing the functions myself (just so I could make a small file that I could be sure I mostly understood). This is clearly different from D, your proposal, but I think it uses the same algorithms (not functions) and the same data structures (down to unpacking and strictness). C: Simple binomial queues without the O(1) findMin modification, as written by you in the last attachment on 3909. This implementation uses some unpacking and strictness annotations, and uses a different deleteMin algorithm than Brodal & Okasaki use. You called this implementation "sparse" to distinguish it from your nested types proposal. D: Wasserman PQs, using your latest proposed patch from ticket 3909. Binomial priority queue timings (in milliseconds, averaged over 20 runs) A: Simple Brodal-Okasaki list implementation: 199.3697 B: Wasserman nested type implementation, simple functions: 235.0143 C: Wasserman's 'Quick' list implementation: 179.7227 D: Wasserman's proposed nested type implementation: 163.47515 D is still the fastest. However, A is a good bit faster than B. Is it possible that C would show a similar advantage over D if it were optimized as carefully as D? Also, why do my benchmarks show only slight differences between the implementations when yours show massive ones? It is possible I am making a noob benchmarking error. Attached is a zip file that should contain everything necessary to replicate my tests.