Re: Proposal: priority queues in containers

Oh, god, so much to respond to...heh. Submit this package for canonicalization as part of the Haskell Platform. I
would for one would support its inclusion.
This is an option I seriously hadn't considered. To be fair, that's because I've never used the Platform myself, preferring rather to have the most up-to-date version of GHC at all times, heh. That said, while I'd be okay with this option, I'd prefer putting it into containers, because I feel like a canonical, reliable priority queue implementation is the sort of thing a self-respecting language ought to have built in. As does Python. In Python, though, the PQ implementation is not a built-in
class in the default namespace (as are dict and set). Rather, it is one of the "batteries included" libraries that come with Python. I think that's the right place for it in Haskell, too.
I don't know Python, but according to Wikipedia, dict and set are built into the language. I don't think it's a fair comparison: set and dict in Python seem to have a role almost as ubiquitous as [] in Haskell, much more ubiquitous than e.g. Data.Set or Data.Map. I'm also not entirely sure that "batteries included" doesn't describe containers, given all the other packages that come with GHC.
* There is no distinction between keys and priority values. A utility type
Prio p a with the instance Ord p => Ord (Prio p a) is exported to allow
usage of distinct keys and priority values.
I disagree with this one. It requires an Ord instance that isn't really an
ordering, and makes a Functor instance impossible. I would prefer an interface separating keys and values like that of Data.Map (which would also increase consistency within the package).
I'd be okay with separating out a priority/value version. However, I'm still clueless as to what you're talking about with respect to Functor -- what's wrong with the following? data Prio p a = Prio p a instance Ord p => Ord (Prio p a) where ... instance Functor (Prio p) where fmap f (Prio p a) = Prio p (f a) I can understand if you're uncomfortable with (==) and (\ x y -> compare x y == EQ) being inequivalent, but neither the H98 Report nor the Prelude make any such claim, as far as I can tell. The Foldable instance breaks the abstraction. I think it should
process elements in order.
I think that wanting to iterate over the priority queue in the middle of the computation, without caring about order, is a perfectly legitimate desire for a programmer! Moreover, the only way to make a Foldable instance process elements in order would be something like data Ord a => PQueue a = .... which I think is an awfully dirty approach. I'm not at all a fan of adding restrictions like that, not least because it adds lots of awkward overhead. Would you be okay with not exporting a Foldable instance at all, but still exporting a toList method which doesn't guarantee any ordering on the return list? My advice would be not to stress over whether priority queues go into
containers. It's not some pristine thing of beauty that deserves treatment with velvet gloves.
I'm...not sure how to respond to this claim. At least part of me wants to say, I genuinely do think the containers package is a piece of art...and then another part pipes up, "except for the inconsistencies between the various minView/maxView versions, and the little differences between IntMap and Map, and..." That said, I wouldn't be a fan of scrapping the style which the containers package has at least tried to create. I could be convinced that rewriting the rest of containers would be a good thing to do, though...and I might just want to do that myself. Hah. How does this implementation compare with using Map/Set as a
priority queue?
Continuing the discussion of the benchmarks: first, Jim, it appears that I'm the one who made a n00b benchmarking error. TT_____TT That said, as you found, my implementation is still slightly faster when the benchmark is corrected. Some comments: - QuickBinom needs to have O(1) findMin for a reasonable comparison. I added that in the benchmark below, and here. - I can't think of any more optimizations for the sparse binomial heap -- I genuinely think it's not going to get better. - There is a way to optimize my implementation still further, but it makes my code much less readable. (Specifically, I start the BinomForest at Succ Zero, and unpack the first child of every node still in the forest. Modifying the whole implementation that way, though, makes it unreadably ugly...and I think QuickBinom is possibly already at that point. I started implementing it, and realized just how ugly it was, and I stopped, but I can finish it if I have to.) Sorting 500,000 elements, compiled with -O2, run with +RTS -H128m -K64m, with another few implementations thrown in for good measure: Times (ms) min mean +/-sd median max Pairing: 1440.090 1482.893 31.501 1482.093 1532.095 Binomial: 1356.084 1389.687 26.881 1388.087 1436.090 SBinomial: 1376.086 1422.489 48.453 1400.088 1520.095 Data.Set: 1712.107 1800.912 74.880 1782.111 1976.123 Skew: 1584.099 1644.503 85.702 1602.101 1848.116 Some other benchmarkshttp://hackage.haskell.org/trac/ghc/attachment/ticket/3909/plot_2.pngwere done by Milan Straka earlier, when we hadn't decided what heap implementation to use at all. His comments:
I think the pairing heaps are out of the question now. I would choose between Binomial and Leftist. The Binomial have O(1) amortized inserts, but beware, that this does not work in persistent setting -- the insert is O(log N) when the heaps are used persistently (there are Skew Binomial Heaps with O(1) insert in the persistent setting too).
I vote for current (v5) Binomial implementation, even if the O(1)
amortized inserts works only non-persistently (ie. under heavy persistent usage, leftist heaps might be a _little_ better).
Conclusions: There aren't any differences in asymptotics, and as it stands, the existing implementation is just as fast. It's also a) done, and b) full of Haskellish type goodness. After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP. Louis Wasserman wasserman.louis@gmail.com http://profiles.google.com/wasserman.louis

On Thu, Mar 18, 2010 at 7:43 AM, Louis Wasserman
Oh, god, so much to respond to...heh.
You did request feedback back there, didn't you :P
As does Python. In Python, though, the PQ implementation is not a built-in class in the default namespace (as are dict and set). Rather, it is one of the "batteries included" libraries that come with Python. I think that's the right place for it in Haskell, too.
I don't know Python, but according to Wikipedia, dict and set are built into the language. I don't think it's a fair comparison: set and dict in Python seem to have a role almost as ubiquitous as [] in Haskell, much more
It's not really the same. pqueue is not in the built-in namespace in python, that's like the Prelude in haskell. pqueue *is* in the default library, which every python installation will have since it comes with the default download, this is what's meant by "batteries included". So that's like containers: you have to explicitly import it, but you shouldn't worry about installations that don't have it because it comes with the compiler. The main difference is that python doesn't have cabal and doesn't have anything like the haskell platform and installing new packages, while easy, is not as automatic as cabal can be.
After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP.
I wanted to use criterion too at one point, but it looked too hard to install so I was scared away...

Hi,
After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP.
I wanted to use criterion too at one point, but it looked too hard to install so I was scared away...
Why is that? Because of the Chart depencency? You can install progression -- a wrapper over criterion, which uses gnuplot to draw graphs, and disable the Chart dependency. I did just cabal --user install -f "-Chart" criterion progression and it installed without a problem, on ghc-6.12.1, ghc-6.13.something, both Linux and Windows. Cheers, Milan

On Thu, Mar 18, 2010 at 4:43 PM, Louis Wasserman
Submit this package for canonicalization as part of the Haskell Platform. I would for one would support its inclusion.
This is an option I seriously hadn't considered. To be fair, that's because I've never used the Platform myself, preferring rather to have the most up-to-date version of GHC at all times, heh. That said, while I'd be okay with this option, I'd prefer putting it into containers, because I feel like a canonical, reliable priority queue implementation is the sort of thing a self-respecting language ought to have built in.
I don't like libraries getting bigger, I like them getting smaller. When they're smaller they're easier to understand and easier to upgrade. So I would also advice proposing your package for the HP (Haskell Platform). I'm even for splitting containers into sub-packages: maps, sets, sequence, graph and tree. Those sub-packages would then need to be added to the HP. Then we could turn containers into a meta-package that depends on these sub-packages (similar to how the HP works[1]). Finally we could deprecate containers and after some time remove it. (I'm also for splitting base even more... but one thing at a time) regards, Bas [1] http://hackage.haskell.org/platform/2009.2.0.2/haskell-platform.cabal

Hi,
I don't like libraries getting bigger, I like them getting smaller.
When they're smaller they're easier to understand and easier to upgrade.
So I would also advice proposing your package for the HP (Haskell Platform).
I'm even for splitting containers into sub-packages: maps, sets, sequence, graph and tree. Those sub-packages would then need to be added to the HP.
Then we could turn containers into a meta-package that depends on these sub-packages (similar to how the HP works[1]).
Finally we could deprecate containers and after some time remove it.
(I'm also for splitting base even more... but one thing at a time)
personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage. I am for adding the priority queues to the containers. Cheers, Milan Straka

On Thu, Mar 18, 2010 at 10:39 PM, Milan Straka
personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage.
But when turning containers into a meta-package these structures will stay together while at the same time allowing people to use and upgrade them separately. regards, Bas

Hi,
On Thu, Mar 18, 2010 at 10:39 PM, Milan Straka
wrote: personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage.
But when turning containers into a meta-package these structures will stay together while at the same time allowing people to use and upgrade them separately.
If the metapackage stays, then there is probably little difference. I was under impression you wanted to remove the metapackage and leave the individual structures as packages (that was my understanding of your mail). I would like to have basic data structures connected together. I do not really mind if the modules are in one library or in several, as long as I could say "I want 'containers'." But independently on the form of the containers package, I would like it to contain a priority queue. Including it to the containers package seems as a good first step. Cheers, Milan

On Mar 18, 2010, at 18:33 , Milan Straka wrote:
I would like to have basic data structures connected together. I do not really mind if the modules are in one library or in several, as long as I could say "I want 'containers'."
This is what the Haskell Platform is for. No real need to add more stuff to containers; if anything, the libraries stored there should be decoupled so that e.g. the recent Data.Map proposals wouldn't disturb programs not using Data.Map. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
participants (5)
-
Bas van Dijk
-
Brandon S. Allbery KF8NH
-
Evan Laforge
-
Louis Wasserman
-
Milan Straka