Proposal: priority queues in containers

newer
change to base to remove obsolete...

Louis Wasserman

16 Mar 2010 16 Mar '10

1:54 p.m.

PROPOSAL: Add a priority queue implementation to the containers package. Specific modules will include Data.PQueue.Min, Data.PQueue.Max, and Data.PQueue. DEADLINE: Next Friday, Mar 26. The ticket, #3909, is here http://hackage.haskell.org/trac/ghc/ticket/3909. It had done some bouncing around glasgow-haskell-users (mostly because I forgot how to do a proper libraries proposal submission), and there have been several implementations benchmarked, compared, and argued about. DETAILS: The proposed implementation benchmarked competitively with every alternative implementation that we tested, and offers good asymptotics in nearly every operation. Specifically, we use a binomial heap, which offers - O(log n) worst-case union - O(log n) worst-case extract (this in particular was a key objective of ours) - amortized O(1), worst-case O(log n) insertion. (In a persistent context, the amortized performance bound does not necessarily hold.) This implementation seems to offer the best balance between practical performance and asymptotic behavior. Moreover, this implementation is extremely memory-efficient, resulting in better performance on large data sets. (See the ticket for benchmarks. The most accurate benchmarks are towards the end.) A couple potentially contentious design decisions: - There is no distinction between keys and priority values. A utility type Prio p a with the instance Ord p => Ord (Prio p a) is exported to allow usage of distinct keys and priority values. - Min-queues and max queues are separated the following way: - Data.PQueue.Min exports the type MinQueue. - Data.PQueue.Max exports the type MaxQueue. (This is implemented as a wrapper around MinQueue.) The method names are the same as Data.PQueue.Min, but I think it's acceptable to force qualified imports when using both a max-queue and a min-queue. - Data.PQueue adds the alias type PQueue = MinQueue, so that the "default" behavior is a min-queue. These design decisions seem to be sufficient to handle most traditional uses for priority queues. In particular, this approach offers a superset of the functionality offered by Java's built-in priority queue implementationhttp://java.sun.com/javase/6/docs/api/java/util/PriorityQueue.html, which makes the same design decisions, but of course, is all imperative and yucky! (Also, it offers inferior asymptotics, heh.) I'm really satisfied with the patch as-is, modulo maybe tinkering with the code style a little. I'm also working on an article for TMR on priority queues in Haskell, some of the different structures we considered, and particularly the new more-type-safe implementation I came up with for binomial heaps in the writing of this implementation. Anyway, discuss, complain, et cetera. Louis Wasserman wasserman.louis@gmail.com http://profiles.google.com/wasserman.louis

Attachments:

attachment.html (text/html — 3.4 KB)

Show replies by date

Malcolm Wallace

16 Mar 16 Mar

2:06 p.m.

On 16 Mar 2010, at 13:54, Louis Wasserman wrote:

...

PROPOSAL: Add a priority queue implementation to the containers package. Specific modules will include Data.PQueue.Min, Data.PQueue.Max, and Data.PQueue.

I have not yet needed a priority queue for any application, so I have no specific technical opinion in this particular proposal. I would suggest that if you continue to receive relative silence on the topic, then it may not be a good candidate for the standard "containers" library, simply because a demand for it has not yet been demonstrated. (You can easily release it as a separate package anyway.) This, I hope, will be sufficient of a cue for anyone who _does_ care about PQs to speak up now! Regards, Malcolm

Johan Tibell

2:53 p.m.

On Tue, Mar 16, 2010 at 2:54 PM, Louis Wasserman wrote:

...

PROPOSAL: Add a priority queue implementation to the containers package. Specific modules will include Data.PQueue.Min, Data.PQueue.Max, and Data.PQueue. DEADLINE: Next Friday, Mar 26. The ticket, #3909, is here. It had done some bouncing around glasgow-haskell-users (mostly because I forgot how to do a proper libraries proposal submission), and there have been several implementations benchmarked, compared, and argued about.

Could please post the code and the generated Haddock documentation somewhere for easier review? Thanks! -- Johan

Ian Lynagh

4:20 p.m.

On Tue, Mar 16, 2010 at 08:54:15AM -0500, Louis Wasserman wrote:

...

PROPOSAL: Add a priority queue implementation to the containers package. Specific modules will include Data.PQueue.Min, Data.PQueue.Max, and Data.PQueue.

I don't have an opinion on whether or not a priority queue should be added to containers, although I do wonder how this implementation compares to packages on hackage, e.g.: http://hackage.haskell.org/package/queuelike http://hackage.haskell.org/package/priority-queue http://hackage.haskell.org/package/pure-priority-queue http://hackage.haskell.org/package/queue http://hackage.haskell.org/package/fingertree-psqueue http://hackage.haskell.org/package/PSQueue Personally, I would also like to see us moving towards having a testsuite for each library, so I would like to see new additions coming with tests so that we aren't moving further from that goal. Thanks Ian

Louis Wasserman

4:54 p.m.

...

http://hackage.haskell.org/package/queuelike

I wrote this one. Heh. This is an improvement on it. More importantly, what I wanted to achieve by adding priority queues to containers was having a canonical implementation for a ubiquitous data structure, which has a simple and user-friendly interface (something the queuelike package, above, attempts but isn't extraordinarily successful at), which is guaranteed to perform efficiently and correctly. As a comparison example, there are many, many finite map implementation packages. (I wrote TrieMap, which is a pretty involved automatic generalized trie map type inference system...yeah.) There's a reason, however, that Data.Map exists, and that so many people use it when an algorithmically superior implementation might exist on Hackage -- because it's programmer-friendly and accessible, and because its performance is well-understood and has been thoroughly verified. When I use Data.Map, I'm not trusting some random schlub to have written bug-free code -- I'm trusting the Haskell built-in libraries, and the people who write and maintain those libraries. I think priority queues are ubiquitous enough to deserve the same treatment. On another note, Ian, it's really not very clear what the proper procedure for adding test suites is. Is it to just add a tester program separately from the patch? Is it to put the test suites somewhere within the containers folder, and have them be an integral part of the patch? There seem to be some sort-of-test-ish files in the containers package already, but I have no idea how to add new tests, and if the existing interface is extensible, it's not at all clear how to extend it.

Louis Wasserman

5:11 p.m.

Also, looking through the packages you linked to: "queue" and "priority-queue" seem to have extremely icky interfaces, nothing as readable as a containers module. "pure-priority-queue" is a wrapper around Data.Map, which is definitely slower than a specialized priority queue. Similarly, "fingertree-psqueue" and "PSQueue", while they expose nicely readable interfaces, are *priority search queues,* which provide considerably stronger functionality than a priority queue. (Specifically, they provide things like lookup-priority and increase-priority.) When being used as a vanilla priority queue, PSQs are considerably slower than specialized priority queues -- this was one of the implementations we tested. The priority search queue is a useful structure for a number of applications, but much more commonly, a simple priority queue is all that is required. In particular, both the C++ STL and java.util.* provide just a vanilla priority queue, reflecting those design choices in those languages. Since priority search queues are actually much more natural in imperative languages than in functional languages -- since we can maintain pointers to the node associated with each key -- I think it makes even less sense for Haskell to use PSQueues in containers. I think it's fair to say that most algorithms which need it only really require priority queues, and that it makes sense to provide for that majority in containers, though we might consider recommending the PSQueue package to programmers who need the full power of priority search queues. My own queuelike library...is decent, but not spectacular, and reflected some early experimentation. The implementation attached to the ticket, however, is more reliable than the implementations I wrote in queuelike. (In particular, Data.PQueue has worst-case O(n) delete-min, and in a persistent context, its amortized performance is still O(n). Ouch.) Louis Wasserman wasserman.louis@gmail.com http://profiles.google.com/wasserman.louis

Jim Apple

5:29 p.m.

...

Since priority search queues are actually much more natural in imperative languages than in functional languages -- since we can maintain pointers to the node associated with each key -- I think it makes even less sense for Haskell to use PSQueues in containers. I think it's fair to say that most algorithms which need it only really require priority queues, and that it makes sense to provide for that majority in containers, though we might consider recommending the PSQueue package to programmers who need the full power of priority search queues.

I'm not sure about some of that. Imperative queues sometimes offer O(1) decreaseKey and O(lg n) delete by storing pointers to elements in a priority queue. The usual purely functional PQs can only offer O(n) decreaseKey and O(n) delete. Purely functional PSQs can offer O(lg n) decreaseKey and O(lg n) delete. Minimum spanning tree is a common application for PQs that makes good use of decreaseKey.

Yitzchak Gale

17 Mar 17 Mar

11:52 a.m.

Louis Wasserman wrote:

...

...what I wanted to achieve by adding priority queues to containers was having a canonical implementation for a ubiquitous data structure, which has a simple and user-friendly interface... which is guaranteed to perform efficiently and correctly.

This is a good goal.

...

...There's a reason, however, that Data.Map exists... I think priority queues are ubiquitous enough to deserve the same treatment.

This I'm not sure about. I've considered using PQ's perhaps two or three times during the past few years, and ended up not using them at all. True, perhaps I would have used them if there had been a simple and reliable PQ implementation at my fingertips. But it's definitely not a ubiquitous meme like Map or Set. How about recording this great work as yet one more PQ package on hackage. But make it far more usable than all of the others by two simple steps: 1. In the package description, say simply and clearly what purpose this package is designed to serve, as you have described in this thread. 2. Submit this package for canonicalization as part of the Haskell Platform. I would for one would support its inclusion.

...

both the C++ STL and java.util.* provide just a vanilla priority queue, reflecting those design choices in those languages

As does Python. In Python, though, the PQ implementation is not a built-in class in the default namespace (as are dict and set). Rather, it is one of the "batteries included" libraries that come with Python. I think that's the right place for it in Haskell, too. Please do consider this suggestion. However, if we do not quickly reach a consensus on this, I will withdraw my suggestion and advocate inclusion in containers as originally proposed. The difference between those options is far less important than the risk of losing this great work to haggling. Thanks, Yitz

Bryan O'Sullivan

3:48 p.m.

On Wed, Mar 17, 2010 at 4:52 AM, Yitzchak Gale wrote:

...

This I'm not sure about. I've considered using PQ's perhaps two or three times during the past few years, and ended up not using them at all.

I'd prefer for personal anecdote not to be the driver of discussions like this. Personally, I've never used Data.Tree or Data.Graph from the containers package, and doubt that I ever will, but it doesn't break my arm or steal my wristwatch if they're available. The types in the containers package have subtly different APIs and degrees of control over things like strictness, and also have almost no test coverage along with few signs of usage-driven optimisation. The bar for adding more code to that package is thus pretty low, in my opinion. If I had time, I'd replace it entirely. My advice would be not to stress over whether priority queues go into containers. It's not some pristine thing of beauty that deserves treatment with velvet gloves. If you want a good source of stress, create a replacement package that makes me happy :-)

...

True, perhaps I would have used them if there had been a simple and reliable PQ implementation at my fingertips. But it's definitely not a ubiquitous meme like Map or Set.

How about recording this great work as yet one more PQ package on hackage. But make it far more usable than all of the others by two simple steps:

1. In the package description, say simply and clearly what purpose this package is designed to serve, as you have described in this thread.

2. Submit this package for canonicalization as part of the Haskell Platform. I would for one would support its inclusion.

...
both the C++ STL and java.util.* provide just a vanilla priority queue, reflecting those design choices in those languages

As does Python. In Python, though, the PQ implementation is not a built-in class in the default namespace (as are dict and set). Rather, it is one of the "batteries included" libraries that come with Python. I think that's the right place for it in Haskell, too.

Please do consider this suggestion. However, if we do not quickly reach a consensus on this, I will withdraw my suggestion and advocate inclusion in containers as originally proposed. The difference between those options is far less important than the risk of losing this great work to haggling.

Thanks, Yitz _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Gwern Branwen

4:03 p.m.

On Wed, Mar 17, 2010 at 11:48 AM, Bryan O'Sullivan wrote:

...

On Wed, Mar 17, 2010 at 4:52 AM, Yitzchak Gale wrote:

...
This I'm not sure about. I've considered using PQ's perhaps two or three times during the past few years, and ended up not using them at all.

I'd prefer for personal anecdote not to be the driver of discussions like this. Personally, I've never used Data.Tree or Data.Graph from the containers package, and doubt that I ever will, but it doesn't break my arm or steal my wristwatch if they're available.

Skipping anecdote, then, let's look at dependencies. Does anyone use the existing priority-queues? The ensemble doesn't look so bad that one would always roll one's own priority queue, yet I don't see *any* non-queue libraries/apps depending on any of the queue libraries: http://bifunctor.homelinux.net/~roel/hackage/packages/archive/revdeps-list.h... (I'm not sure how often that's updated but I wouldn't expect being up to date to make much of a difference.) If there really are no users of priority-queue, that's a pretty good argument for keeping priority-queue out of containers, leaving it on Hackage, and coming back in a year or two to see whether there's been any uptake. -- gwern

Brandon S. Allbery KF8NH

11:52 p.m.

On Mar 17, 2010, at 11:48 , Bryan O'Sullivan wrote:

...

My advice would be not to stress over whether priority queues go into containers. It's not some pristine thing of beauty that deserves treatment with velvet gloves. If you want a good source of stress, create a replacement package that makes me happy :-)

Especially in the absence of guidelines for "that makes you happy", I think the answer to that one is "build it and see if they come". (There does seem to be a lot of interest in a "cleaner" containers library, but not a whole lot of consensus.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Jed Brown

18 Mar 18 Mar

10:26 a.m.

On Wed, 17 Mar 2010 13:52:36 +0200, Yitzchak Gale wrote:

...

As does Python. In Python, though, the PQ implementation is not a built-in class in the default namespace (as are dict and set). Rather, it is one of the "batteries included" libraries that come with Python. I think that's the right place for it in Haskell, too.

FWIW, Haskell's containers are far less central than Python's dict, upon which the type system itself is built. Jed

Louis Wasserman

3:43 p.m.

Oh, god, so much to respond to...heh. Submit this package for canonicalization as part of the Haskell Platform. I

...

would for one would support its inclusion.

This is an option I seriously hadn't considered. To be fair, that's because I've never used the Platform myself, preferring rather to have the most up-to-date version of GHC at all times, heh. That said, while I'd be okay with this option, I'd prefer putting it into containers, because I feel like a canonical, reliable priority queue implementation is the sort of thing a self-respecting language ought to have built in. As does Python. In Python, though, the PQ implementation is not a built-in

...

class in the default namespace (as are dict and set). Rather, it is one of the "batteries included" libraries that come with Python. I think that's the right place for it in Haskell, too.

I don't know Python, but according to Wikipedia, dict and set are built into the language. I don't think it's a fair comparison: set and dict in Python seem to have a role almost as ubiquitous as [] in Haskell, much more ubiquitous than e.g. Data.Set or Data.Map. I'm also not entirely sure that "batteries included" doesn't describe containers, given all the other packages that come with GHC.

...

* There is no distinction between keys and priority values. A utility type

...

Prio p a with the instance Ord p => Ord (Prio p a) is exported to allow

...

usage of distinct keys and priority values.

I disagree with this one. It requires an Ord instance that isn't really an

...

ordering, and makes a Functor instance impossible. I would prefer an interface separating keys and values like that of Data.Map (which would also increase consistency within the package).

I'd be okay with separating out a priority/value version. However, I'm still clueless as to what you're talking about with respect to Functor -- what's wrong with the following? data Prio p a = Prio p a instance Ord p => Ord (Prio p a) where ... instance Functor (Prio p) where fmap f (Prio p a) = Prio p (f a) I can understand if you're uncomfortable with (==) and (\ x y -> compare x y == EQ) being inequivalent, but neither the H98 Report nor the Prelude make any such claim, as far as I can tell. The Foldable instance breaks the abstraction. I think it should

...

process elements in order.

I think that wanting to iterate over the priority queue in the middle of the computation, without caring about order, is a perfectly legitimate desire for a programmer! Moreover, the only way to make a Foldable instance process elements in order would be something like data Ord a => PQueue a = .... which I think is an awfully dirty approach. I'm not at all a fan of adding restrictions like that, not least because it adds lots of awkward overhead. Would you be okay with not exporting a Foldable instance at all, but still exporting a toList method which doesn't guarantee any ordering on the return list? My advice would be not to stress over whether priority queues go into

...

containers. It's not some pristine thing of beauty that deserves treatment with velvet gloves.

I'm...not sure how to respond to this claim. At least part of me wants to say, I genuinely do think the containers package is a piece of art...and then another part pipes up, "except for the inconsistencies between the various minView/maxView versions, and the little differences between IntMap and Map, and..." That said, I wouldn't be a fan of scrapping the style which the containers package has at least tried to create. I could be convinced that rewriting the rest of containers would be a good thing to do, though...and I might just want to do that myself. Hah. How does this implementation compare with using Map/Set as a

...

priority queue?

Continuing the discussion of the benchmarks: first, Jim, it appears that I'm the one who made a n00b benchmarking error. TT_____TT That said, as you found, my implementation is still slightly faster when the benchmark is corrected. Some comments: - QuickBinom needs to have O(1) findMin for a reasonable comparison. I added that in the benchmark below, and here. - I can't think of any more optimizations for the sparse binomial heap -- I genuinely think it's not going to get better. - There is a way to optimize my implementation still further, but it makes my code much less readable. (Specifically, I start the BinomForest at Succ Zero, and unpack the first child of every node still in the forest. Modifying the whole implementation that way, though, makes it unreadably ugly...and I think QuickBinom is possibly already at that point. I started implementing it, and realized just how ugly it was, and I stopped, but I can finish it if I have to.) Sorting 500,000 elements, compiled with -O2, run with +RTS -H128m -K64m, with another few implementations thrown in for good measure: Times (ms) min mean +/-sd median max Pairing: 1440.090 1482.893 31.501 1482.093 1532.095 Binomial: 1356.084 1389.687 26.881 1388.087 1436.090 SBinomial: 1376.086 1422.489 48.453 1400.088 1520.095 Data.Set: 1712.107 1800.912 74.880 1782.111 1976.123 Skew: 1584.099 1644.503 85.702 1602.101 1848.116 Some other benchmarkshttp://hackage.haskell.org/trac/ghc/attachment/ticket/3909/plot_2.pngwere done by Milan Straka earlier, when we hadn't decided what heap implementation to use at all. His comments:

...

I think the pairing heaps are out of the question now. I would choose between Binomial and Leftist. The Binomial have O(1) amortized inserts, but beware, that this does not work in persistent setting -- the insert is O(log N) when the heaps are used persistently (there are Skew Binomial Heaps with O(1) insert in the persistent setting too).

I vote for current (v5) Binomial implementation, even if the O(1)

...

amortized inserts works only non-persistently (ie. under heavy persistent usage, leftist heaps might be a _little_ better).

Conclusions: There aren't any differences in asymptotics, and as it stands, the existing implementation is just as fast. It's also a) done, and b) full of Haskellish type goodness. After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP. Louis Wasserman wasserman.louis@gmail.com http://profiles.google.com/wasserman.louis

Evan Laforge

3:56 p.m.

On Thu, Mar 18, 2010 at 7:43 AM, Louis Wasserman wrote:

...

Oh, god, so much to respond to...heh.

You did request feedback back there, didn't you :P

...

...
As does Python. In Python, though, the PQ implementation is not a built-in class in the default namespace (as are dict and set). Rather, it is one of the "batteries included" libraries that come with Python. I think that's the right place for it in Haskell, too.

I don't know Python, but according to Wikipedia, dict and set are built into the language. I don't think it's a fair comparison: set and dict in Python seem to have a role almost as ubiquitous as [] in Haskell, much more

It's not really the same. pqueue is not in the built-in namespace in python, that's like the Prelude in haskell. pqueue *is* in the default library, which every python installation will have since it comes with the default download, this is what's meant by "batteries included". So that's like containers: you have to explicitly import it, but you shouldn't worry about installations that don't have it because it comes with the compiler. The main difference is that python doesn't have cabal and doesn't have anything like the haskell platform and installing new packages, while easy, is not as automatic as cabal can be.

...

After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP.

I wanted to use criterion too at one point, but it looked too hard to install so I was scared away...

Milan Straka

4:01 p.m.

Hi,

...

...
After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP.

I wanted to use criterion too at one point, but it looked too hard to install so I was scared away...

Why is that? Because of the Chart depencency? You can install progression -- a wrapper over criterion, which uses gnuplot to draw graphs, and disable the Chart dependency. I did just cabal --user install -f "-Chart" criterion progression and it installed without a problem, on ghc-6.12.1, ghc-6.13.something, both Linux and Windows. Cheers, Milan

Jim Apple

6:31 p.m.

...

That said, as you found, my implementation is still slightly faster when the benchmark is corrected.

Well, I only tested one thing (heap sort), and QuickBinom is actually faster under different options (-prof -auto-all or without calling performGC before every heapsort). In either case, though, the speedup is only by a small margin. I think picking a PQ implementation should be based on a few more benchmarks under a few more options, but I'm not sure that the decision must be made now. Louis, you note later in this email that your implementation is done. That seems important to me. If we fix a sane interface now, the implementation can be changed later if we find something more efficient, right?

Louis Wasserman

10:02 p.m.

First off: I've finally gotten set up with code.haskell.org. A darcs repo of my binomial heap implementation is at http://code.haskell.org/containers-pqueue/. Also on that site is the haddock documentationhttp://code.haskell.org/containers-pqueue/containers-0.3.0.0/html/, which I'm sure many people will appreciate. Somebody else (Ross?) had requested a separate version of the priority queue to handle priorities and values separately, so I'm working on that. I've also deleted the Foldable instance of MinQueue, though I still offer a clearly documented unordered toList, which will stay in place. Well, I only tested one thing (heap sort), and QuickBinom is actually faster under different options (-prof -auto-all or without calling

...

performGC before every heapsort).

-prof -auto-all blocks a significant number of optimizations from actually happening. Essentially, if the compiler has to figure out how much time is taken by some particular function, then it's not allowed to inline or eliminate uses of that function. I don't consider it a fair comparison. Moreover, calling performGC makes sense -- it essentially wipes the slate, making each successive comparison unbiased by the previous one. Louis, you note later in this email that your implementation is done.

...

That seems important to me. If we fix a sane interface now, the implementation can be changed later if we find something more efficient, right?

Absolutely true. However, I finally assembled a benchmark that I think is a fair comparison -- a heapsort, essentially "length . Data.List.unfoldr extract . foldr insert empty". The resultshttp://code.haskell.org/containers-pqueue/bench-chart.pdfare pretty supportive of my implementation. (Original timing data, outputted by Progression, is herehttp://code.haskell.org/containers-pqueue/bench-final.csv. I think all of the original code for the benchmark is in the code.haskell.org folder, just not part of the darcs repo. However, I had to slightly modify my copy of Progression to force the GC, so YMMV.) I'm still pretty strongly in favor of putting priority queues into containers: other programming languages consider it necessary for inclusion into standardized libraries, people will be more likely to use appropriate data structures for their needs when reliable, friendly implementations are already at their fingertips, and other reasons already discussed. In general, I think priority queues should be treated the same as Data.Map or Data.Set, like java.util.* or the C++ STL treat their respective versions of those structures. I don't think there's likely to be any agreement any time soon to split up containers, so my inclination is to put pqueues into containers, and maybe if we agree later to split containers, then priority queues will be part of that decision. On a somewhat different note: writing unit tests in the existing framework is awkward and hard! Writing QuickCheck tests is trivial and much more exhaustive than what the existing tests look like. The existing containers tests appear to check one particular example that was the source of a preexisting bug, rather than examining exhaustively that methods work correctly, to eliminate potentially new bugs. I mean, Data.IntSet's one existing test is main = print $ isProperSubsetOf (fromList [2,3]) $ fromList [2,3,4] which might have found some preexisting bug, but certainly doesn't convince me of Data.IntSet's correctness. Is this normal? Is it permissible to use QuickCheck inside unit tests? (A collection of QuickCheck tests -- all of which my implementation passes -- is in the code.haskell.org directory.) Louis Wasserman

Thomas Schilling

11:50 p.m.

On 18 March 2010 22:02, Louis Wasserman wrote:

...

I'm still pretty strongly in favor of putting priority queues into containers: other programming languages consider it necessary for inclusion into standardized libraries, people will be more likely to use appropriate data structures for their needs when reliable, friendly implementations are already at their fingertips, and other reasons already discussed.

The Haskell Platform is really is intended to be available at your fingertips. Unfortunately, the following does not work (although I thought it's supposed to) $ cabal install haskell-platform Nevertheless, the libraries bundled with GHC are those libraries that GHC itself needs and which therefore cannot be upgraded independently. The real standard libraries are the Haskell Platform and if your package is part of the platform, then your package *is* in status equivalent to things like java.util.*. This weekend's Hackathon in Zürich will partly be dedicated to getting the next release of the Platform release ready. If you can get your package into the following platform release (due 6 months after the current release), then this would surely make it the default package for anyone in need of a PQ. / Thomas -- Push the envelope. Watch it bend.

Louis Wasserman

19 Mar 19 Mar

12:50 a.m.

Okay, let me ask the following question: Would anybody besides me be heartbroken if priority queues *weren't* put into containers, but were instead put into the Platform? Louis Wasserman wasserman.louis@gmail.com http://profiles.google.com/wasserman.louis On Thu, Mar 18, 2010 at 6:50 PM, Thomas Schilling wrote:

...

On 18 March 2010 22:02, Louis Wasserman wrote:

...
I'm still pretty strongly in favor of putting priority queues into containers: other programming languages consider it necessary for inclusion into standardized libraries, people will be more likely to use appropriate data structures for their needs when reliable, friendly implementations are already at their fingertips, and other reasons already discussed.

The Haskell Platform is really is intended to be available at your fingertips. Unfortunately, the following does not work (although I thought it's supposed to)

$ cabal install haskell-platform

Nevertheless, the libraries bundled with GHC are those libraries that GHC itself needs and which therefore cannot be upgraded independently. The real standard libraries are the Haskell Platform and if your package is part of the platform, then your package *is* in status equivalent to things like java.util.*.

This weekend's Hackathon in Zürich will partly be dedicated to getting the next release of the Platform release ready. If you can get your package into the following platform release (due 6 months after the current release), then this would surely make it the default package for anyone in need of a PQ.

/ Thomas -- Push the envelope. Watch it bend.

wren ng thornton

2:21 a.m.

Louis Wasserman wrote:

...

Thomas Schilling wrote:

...
Nevertheless, the libraries bundled with GHC are those libraries that GHC itself needs and which therefore cannot be upgraded independently. The real standard libraries are the Haskell Platform and if your package is part of the platform, then your package *is* in status equivalent to things like java.util.*.

Okay, let me ask the following question:

Would anybody besides me be heartbroken if priority queues *weren't* put into containers, but were instead put into the Platform?

So long as there's a pqueue implementation in the Platform, I wouldn't be heartbroken. I can see how it'd be nice to put it in containers (e.g., people just need to depend on containers in their .cabal files) but now that the Platform is here I don't see any big reasons to want to make the packages in it any bigger when a separate package would do the same job just as well. By putting the pqueue implementation into containers that means everyone using containers will have a larger library to link against (unless GHC is smarter than I suspect), regardless of whether they use pqueues. Unless I'm wrong about how smart GHC is, that implies that the containers package should[1] even be broken up to facilitate smaller binaries. [1] Ignoring backwards compatibility issues. -- Live well, ~wren

Bertram Felgenhauer

26 Mar 26 Mar

3:15 p.m.

Louis Wasserman wrote:

...

Would anybody besides me be heartbroken if priority queues *weren't* put into containers, but were instead put into the Platform?

I don't feel strongly either way. One definite advantage of putting it into containers would be that the module would be easy to find. One disadvantage is that improvements will take long to trickle through the next ghc or haskell platform release - updating the containers package in an existing installation would be tricky, I think, because a lot of packages depend on containers. As far as I can tell you did a great job at tuning the implementation. It beats my own home-ground binomial heap (which uses essentially the same data structure, except that it has a special case for leafs) hands down when it comes to running time. So if the consensus is to make this a separate package, please announce it prominently. I'm certainly interested in using it. Best regards, Bertram

Brandon S. Allbery KF8NH

19 Mar 19 Mar

5:39 a.m.

On Mar 18, 2010, at 19:50 , Thomas Schilling wrote:

...

The Haskell Platform is really is intended to be available at your fingertips. Unfortunately, the following does not work (although I thought it's supposed to)

$ cabal install haskell-platform

Other way around: installing the Haskell Platorm gives you ghc, a set of libraries known to work with that version of ghc and with each other, and cabal-install. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Bas van Dijk

18 Mar 18 Mar

9:21 p.m.

On Thu, Mar 18, 2010 at 4:43 PM, Louis Wasserman wrote:

...

...
Submit this package for canonicalization as part of the Haskell Platform. I would for one would support its inclusion.

This is an option I seriously hadn't considered. To be fair, that's because I've never used the Platform myself, preferring rather to have the most up-to-date version of GHC at all times, heh. That said, while I'd be okay with this option, I'd prefer putting it into containers, because I feel like a canonical, reliable priority queue implementation is the sort of thing a self-respecting language ought to have built in.

I don't like libraries getting bigger, I like them getting smaller. When they're smaller they're easier to understand and easier to upgrade. So I would also advice proposing your package for the HP (Haskell Platform). I'm even for splitting containers into sub-packages: maps, sets, sequence, graph and tree. Those sub-packages would then need to be added to the HP. Then we could turn containers into a meta-package that depends on these sub-packages (similar to how the HP works[1]). Finally we could deprecate containers and after some time remove it. (I'm also for splitting base even more... but one thing at a time) regards, Bas [1] http://hackage.haskell.org/platform/2009.2.0.2/haskell-platform.cabal

Milan Straka

9:39 p.m.

Hi,

...

I don't like libraries getting bigger, I like them getting smaller.

When they're smaller they're easier to understand and easier to upgrade.

So I would also advice proposing your package for the HP (Haskell Platform).

I'm even for splitting containers into sub-packages: maps, sets, sequence, graph and tree. Those sub-packages would then need to be added to the HP.

Then we could turn containers into a meta-package that depends on these sub-packages (similar to how the HP works[1]).

Finally we could deprecate containers and after some time remove it.

(I'm also for splitting base even more... but one thing at a time)

personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage. I am for adding the priority queues to the containers. Cheers, Milan Straka

Bas van Dijk

10:22 p.m.

On Thu, Mar 18, 2010 at 10:39 PM, Milan Straka wrote:

...

personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage.

But when turning containers into a meta-package these structures will stay together while at the same time allowing people to use and upgrade them separately. regards, Bas

Milan Straka

10:33 p.m.

Hi,

...

On Thu, Mar 18, 2010 at 10:39 PM, Milan Straka wrote:

...
personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage.

But when turning containers into a meta-package these structures will stay together while at the same time allowing people to use and upgrade them separately.

If the metapackage stays, then there is probably little difference. I was under impression you wanted to remove the metapackage and leave the individual structures as packages (that was my understanding of your mail). I would like to have basic data structures connected together. I do not really mind if the modules are in one library or in several, as long as I could say "I want 'containers'." But independently on the form of the containers package, I would like it to contain a priority queue. Including it to the containers package seems as a good first step. Cheers, Milan

Brandon S. Allbery KF8NH

19 Mar 19 Mar

5:38 a.m.

On Mar 18, 2010, at 18:33 , Milan Straka wrote:

...

I would like to have basic data structures connected together. I do not really mind if the modules are in one library or in several, as long as I could say "I want 'containers'."

This is what the Haskell Platform is for. No real need to add more stuff to containers; if anything, the libraries stored there should be decoupled so that e.g. the recent Data.Map proposals wouldn't disturb programs not using Data.Map. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

wren ng thornton

2:27 a.m.

Bas van Dijk wrote:

...

I'm even for splitting containers into sub-packages: maps, sets, sequence, graph and tree. Those sub-packages would then need to be added to the HP.

Then we could turn containers into a meta-package that depends on these sub-packages (similar to how the HP works[1]).

In general I'm all for this, but there are some big details to work out. The most pressing is the issue of backwards/forwards compatibility. I think that before we head down this path we should get a mechanism in place for making "metapackages" a reality ---that is, as something distinct from the compatibility wrappers we currently use to patch the gaps. But that's a topic for another thread, ne? -- Live well, ~wren

Ian Lynagh

17 Mar 17 Mar

3:56 p.m.

On Tue, Mar 16, 2010 at 11:54:35AM -0500, Louis Wasserman wrote:

...

On another note, Ian, it's really not very clear what the proper procedure for adding test suites is. Is it to just add a tester program separately from the patch? Is it to put the test suites somewhere within the containers folder, and have them be an integral part of the patch? There seem to be some sort-of-test-ish files in the containers package already, but I have no idea how to add new tests, and if the existing interface is extensible, it's not at all clear how to extend it.

The best way is probably to make tests using the GHC testsuite framework in tests/priority-queue/. The framework would benefit from some tweaking so that it is better able to test a single library in isolation, though. Thanks Ian

Simon Marlow

18 Mar 18 Mar

10:50 a.m.

On 17/03/2010 15:56, Ian Lynagh wrote:

...

On Tue, Mar 16, 2010 at 11:54:35AM -0500, Louis Wasserman wrote:

...
On another note, Ian, it's really not very clear what the proper procedure for adding test suites is. Is it to just add a tester program separately from the patch? Is it to put the test suites somewhere within the containers folder, and have them be an integral part of the patch? There seem to be some sort-of-test-ish files in the containers package already, but I have no idea how to add new tests, and if the existing interface is extensible, it's not at all clear how to extend it.

The best way is probably to make tests using the GHC testsuite framework in tests/priority-queue/. The framework would benefit from some tweaking so that it is better able to test a single library in isolation, though.

Shouldn't the tests go in the library repo itself? Also, if we want people to use the GHC testsuite framework for individual library testing, we ought to separate the framework from the huge test suite itself. Then I bet we'd get complaints that it isn't written in the right language :-) Cheers, Simon

Ian Lynagh

11:41 a.m.

On Thu, Mar 18, 2010 at 10:50:05AM +0000, Simon Marlow wrote:

...

On 17/03/2010 15:56, Ian Lynagh wrote:

...
On Tue, Mar 16, 2010 at 11:54:35AM -0500, Louis Wasserman wrote:

...
On another note, Ian, it's really not very clear what the proper procedure for adding test suites is. Is it to just add a tester program separately from the patch? Is it to put the test suites somewhere within the containers folder, and have them be an integral part of the patch? There seem to be some sort-of-test-ish files in the containers package already, but I have no idea how to add new tests, and if the existing interface is extensible, it's not at all clear how to extend it.

The best way is probably to make tests using the GHC testsuite framework in tests/priority-queue/. The framework would benefit from some tweaking so that it is better able to test a single library in isolation, though.

Shouldn't the tests go in the library repo itself?

Right, I mean tests/priority-queue/ in the containers repo (there are already a couple of tests in tests/). Thanks Ian

Jim Apple

16 Mar 16 Mar

4:56 p.m.

...

I do wonder how this implementation compares to packages on hackage, e.g.:

Also: http://hackage.haskell.org/package/heap http://hackage.haskell.org/package/TreeStructures

Stephan Friedrichs

19 Mar 19 Mar

9:59 a.m.

On 16/03/10 17:56, Jim Apple wrote:

...

...
I do wonder how this implementation compares to packages on hackage, e.g.:

Also:

http://hackage.haskell.org/package/heap

Let me speak for this one, as I happen to know it. Unfortunately I don't have much time for a thorough performance comparison (Maybe someone can come up with general testcases? It should be possible to quickly rewrite them for each package!), but I did two tests an heap was twice as fast as the proposition (containers-pqueue): *Data.PQueue.Min> takeWhile (<= 5) $ (fromList [1..100000]) `union` (fromList [100001..200000]) [1,2,3,4,5] and *Data.PQueue.Min> takeWhile (<= 5) $ (fromAscList [1..100000]) `union` (fromAscList [100001..200000]) [1,2,3,4,5] ===== vs. ===== *Data.Heap> takeWhile (<=5) $ (fromList [1..100000]) `Data.Heap.union` (fromList [100001..200000] :: MinHeap Int) [1,2,3,4,5] and *Data.Heap> takeWhile (<=5) $ (fromAscList [1..100000]) `Data.Heap.union` (fromAscList [100001..200000] :: MinHeap Int) [1,2,3,4,5] In both cases, heap was about twice as fast as containers-pqueue. I don't know if theses test cases are representative, please comment on that :) Regards, Stephan

Ross Paterson

17 Mar 17 Mar

3:47 p.m.

On Tue, Mar 16, 2010 at 08:54:15AM -0500, Louis Wasserman wrote:

...

A couple potentially contentious design decisions:

* There is no distinction between keys and priority values. A utility type Prio p a with the instance Ord p => Ord (Prio p a) is exported to allow usage of distinct keys and priority values.

I disagree with this one. It requires an Ord instance that isn't really an ordering, and makes a Functor instance impossible. I would prefer an interface separating keys and values like that of Data.Map (which would also increase consistency within the package). Other comments: The Foldable instance breaks the abstraction. I think it should process elements in order. How does this implementation compare with using Map/Set as a priority queue? The documentation for union should mention that it's not stable. That's a pity, but may be justified if there's a big performance difference with the representations that support stable union. I think the containers package should offer general extractor functions with a parameter of type (a -> Maybe (b, a)), so we wouldn't need special cases like these ones.

Stephan Friedrichs

19 Mar 19 Mar

9:52 a.m.

On 17/03/10 16:47, Ross Paterson wrote:

...

On Tue, Mar 16, 2010 at 08:54:15AM -0500, Louis Wasserman wrote:

...
A couple potentially contentious design decisions:

* There is no distinction between keys and priority values. A utility type Prio p a with the instance Ord p => Ord (Prio p a) is exported to allow usage of distinct keys and priority values.

I disagree with this one. It requires an Ord instance that isn't really an ordering, and makes a Functor instance impossible. I would prefer an interface separating keys and values like that of Data.Map (which would also increase consistency within the package).

I agree. The Functor instance is crucial (at least to me). Besides, I'm the author of [1] and writing that package, I made the experience that it absolutely makes sense to separate the part of a queue entry that determines it's position within the queue from the part that is user data. A clean Functor instance is just one of the benefits.

...

Other comments:

The Foldable instance breaks the abstraction. I think it should process elements in order.

Yes. I'd consider everything else a bug.

...

[...]

Other points are: * I'm not comfortable with having two redundant modules, one for Min- and one for MaxQueue. It is possible to implement both in the same data structure and still get type errors when e.g. 'union' on a Min- and a MaxQueue (I did that in [1], please have a look at the latest version). My solution needs type families, which would not be good for containers, but I'm pretty sure there's a containers-compatible solution. * Not so important, but IMHO 'extract' should be called 'view', because that's, how this sort of function is usually called, isn't it? ;) Regards, Stephan [1] http://hackage.haskell.org/package/heap

Ross Paterson

10:26 a.m.

On Fri, Mar 19, 2010 at 10:52:10AM +0100, Stephan Friedrichs wrote:

...

On 17/03/10 16:47, Ross Paterson wrote:

...
On Tue, Mar 16, 2010 at 08:54:15AM -0500, Louis Wasserman wrote:

...
A couple potentially contentious design decisions:

* There is no distinction between keys and priority values. A utility type Prio p a with the instance Ord p => Ord (Prio p a) is exported to allow usage of distinct keys and priority values.

I disagree with this one. It requires an Ord instance that isn't really an ordering, and makes a Functor instance impossible. I would prefer an interface separating keys and values like that of Data.Map (which would also increase consistency within the package).

I agree. The Functor instance is crucial (at least to me). Besides, I'm the author of [1] and writing that package, I made the experience that it absolutely makes sense to separate the part of a queue entry that determines it's position within the queue from the part that is user data. A clean Functor instance is just one of the benefits.

For a containers-compatible interface (ignoring the implementation), see http://hackage.haskell.org/packages/archive/fingertree/0.0.1.0/doc/html/Data...

...

* I'm not comfortable with having two redundant modules, one for Min- and one for MaxQueue. It is possible to implement both in the same data structure and still get type errors when e.g. 'union' on a Min- and a MaxQueue (I did that in [1], please have a look at the latest version). My solution needs type families, which would not be good for containers, but I'm pretty sure there's a containers-compatible solution.

If keys are separate, the two versions could be easily achieved using an inversion adaptor on Ord, which has been proposed before and would have many other uses.

5589

Age (days ago)

5599

Last active (days ago)

List overview

Download

35 comments

19 participants

participants (19)

Bas van Dijk
Bertram Felgenhauer
Brandon S. Allbery KF8NH
Bryan O'Sullivan
Evan Laforge
Gwern Branwen
Ian Lynagh
Jed Brown
Jim Apple
Johan Tibell
Louis Wasserman
Malcolm Wallace
Milan Straka
Ross Paterson
Simon Marlow
Stephan Friedrichs
Thomas Schilling
wren ng thornton
Yitzchak Gale