
First off: I've finally gotten set up with code.haskell.org. A darcs repo of my binomial heap implementation is at http://code.haskell.org/containers-pqueue/. Also on that site is the haddock documentationhttp://code.haskell.org/containers-pqueue/containers-0.3.0.0/html/, which I'm sure many people will appreciate. Somebody else (Ross?) had requested a separate version of the priority queue to handle priorities and values separately, so I'm working on that. I've also deleted the Foldable instance of MinQueue, though I still offer a clearly documented unordered toList, which will stay in place. Well, I only tested one thing (heap sort), and QuickBinom is actually faster under different options (-prof -auto-all or without calling
performGC before every heapsort).
-prof -auto-all blocks a significant number of optimizations from actually happening. Essentially, if the compiler has to figure out how much time is taken by some particular function, then it's not allowed to inline or eliminate uses of that function. I don't consider it a fair comparison. Moreover, calling performGC makes sense -- it essentially wipes the slate, making each successive comparison unbiased by the previous one. Louis, you note later in this email that your implementation is done.
That seems important to me. If we fix a sane interface now, the implementation can be changed later if we find something more efficient, right?
Absolutely true. However, I finally assembled a benchmark that I think is a fair comparison -- a heapsort, essentially "length . Data.List.unfoldr extract . foldr insert empty". The resultshttp://code.haskell.org/containers-pqueue/bench-chart.pdfare pretty supportive of my implementation. (Original timing data, outputted by Progression, is herehttp://code.haskell.org/containers-pqueue/bench-final.csv. I think all of the original code for the benchmark is in the code.haskell.org folder, just not part of the darcs repo. However, I had to slightly modify my copy of Progression to force the GC, so YMMV.) I'm still pretty strongly in favor of putting priority queues into containers: other programming languages consider it necessary for inclusion into standardized libraries, people will be more likely to use appropriate data structures for their needs when reliable, friendly implementations are already at their fingertips, and other reasons already discussed. In general, I think priority queues should be treated the same as Data.Map or Data.Set, like java.util.* or the C++ STL treat their respective versions of those structures. I don't think there's likely to be any agreement any time soon to split up containers, so my inclination is to put pqueues into containers, and maybe if we agree later to split containers, then priority queues will be part of that decision. On a somewhat different note: writing unit tests in the existing framework is awkward and hard! Writing QuickCheck tests is trivial and much more exhaustive than what the existing tests look like. The existing containers tests appear to check one particular example that was the source of a preexisting bug, rather than examining exhaustively that methods work correctly, to eliminate potentially new bugs. I mean, Data.IntSet's one existing test is main = print $ isProperSubsetOf (fromList [2,3]) $ fromList [2,3,4] which might have found some preexisting bug, but certainly doesn't convince me of Data.IntSet's correctness. Is this normal? Is it permissible to use QuickCheck inside unit tests? (A collection of QuickCheck tests -- all of which my implementation passes -- is in the code.haskell.org directory.) Louis Wasserman