Re: Proposal: priority queues in containers

18 Mar 2010

      First off: I've finally gotten set up with code.haskell.org.  A darcs repo
of my binomial heap implementation is at
http://code.haskell.org/containers-pqueue/.  Also on that site is the haddock
documentationhttp://code.haskell.org/containers-pqueue/containers-0.3.0.0/html/,
which I'm sure many people will appreciate.  Somebody else (Ross?) had
requested a separate version of the priority queue to handle priorities and
values separately, so I'm working on that.
I've also deleted the Foldable instance of MinQueue, though I still offer a
clearly documented unordered toList, which will stay in place.

Well, I only tested one thing (heap sort), and QuickBinom is actually

faster under different options (-prof -auto-all or without calling
...
performGC before every heapsort).
-prof -auto-all blocks a significant number of optimizations from actually
happening.  Essentially, if the compiler has to figure out how much time is
taken by some particular function, then it's not allowed to inline or
eliminate uses of that function.  I don't consider it a fair comparison.
 Moreover, calling performGC makes sense -- it essentially wipes the slate,
making each successive comparison unbiased by the previous one.

Louis, you note later in this email that your implementation is done.
...
That seems important to me. If we fix a sane interface now, the
implementation can be changed later if we find something more
efficient, right?
Absolutely true.

However, I finally assembled a benchmark that I think is a fair comparison
-- a heapsort, essentially "length . Data.List.unfoldr extract . foldr
insert empty".   The
resultshttp://code.haskell.org/containers-pqueue/bench-chart.pdfare
pretty supportive of my implementation.  (Original timing data,
outputted by Progression, is
herehttp://code.haskell.org/containers-pqueue/bench-final.csv.
 I think all of the original code for the benchmark is in the
code.haskell.org folder, just not part of the darcs repo.  However, I had to
slightly modify my copy of Progression to force the GC, so YMMV.)

I'm still pretty strongly in favor of putting priority queues into
containers: other programming languages consider it necessary for inclusion
into standardized libraries, people will be more likely to use appropriate
data structures for their needs when reliable, friendly implementations are
already at their fingertips, and other reasons already discussed.

In general, I think priority queues should be treated the same as Data.Map
or Data.Set, like java.util.* or the C++ STL treat their respective versions
of those structures.  I don't think there's likely to be any agreement any
time soon to split up containers, so my inclination is to put pqueues into
containers, and maybe if we agree later to split containers, then priority
queues will be part of that decision.

On a somewhat different note: writing unit tests in the existing framework
is awkward and hard!  Writing QuickCheck tests is trivial and much more
exhaustive than what the existing tests look like.  The existing containers
tests appear to check one particular example that was the source of a
preexisting bug, rather than examining exhaustively that methods work
correctly, to eliminate potentially new bugs.  I mean, Data.IntSet's one
existing test is

main = print $ isProperSubsetOf (fromList [2,3]) $ fromList [2,3,4]

which might have found some preexisting bug, but certainly doesn't convince
me of Data.IntSet's correctness.

Is this normal?  Is it permissible to use QuickCheck inside unit tests?  (A
collection of QuickCheck tests -- all of which my implementation passes --
is in the code.haskell.org directory.)

Louis Wasserman