Feedback request: priority queues in containers

Hey, I'd like to request some more feedback on the proposedhttp://hackage.haskell.org/trac/ghc/ticket/3909implementation for priority queues in containers. Mostly, I feel like adding a new module to containers should be contentious, and there hasn't been as much griping or contention as I expected. The silence is feeling kind of eerie! I'm inclined to set a deadline of next Wednesday, Mar 24, because the ticket was started two weeks ago and the current implementation has been essentially unchanged for a week. After that point, I'll consider the patch final. The proposed implementation benchmarked competitively with every alternative implementation that we tested, and offers good asymptotics in nearly every operation. Specifically, we use a binomial heap, which offers - O(log n) worst-case union - O(log n) worst-case extract (this in particular was a key objective of ours) - amortized O(1), worst-case O(log n) insertion. (In a persistent context, the amortized performance bound does not necessarily hold.) This implementation seems to offer the best balance between practical performance and asymptotic behavior. Moreover, this implementation is extremely memory-efficient, resulting in better performance on large data sets. (See the ticket for benchmarks. The most accurate benchmarks are towards the end.) A couple potentially contentious design decisions: - There is no distinction between keys and priority values. A utility type Prio p a with the instance Ord p => Ord (Prio p a) is exported to allow usage of distinct keys and priority values. - Min-queues and max queues are separated the following way: - Data.PQueue.Min exports the type MinQueue. - Data.PQueue.Max exports the type MaxQueue. (This is implemented as a wrapper around MinQueue.) The method names are the same, but I think this is acceptable, because I can't think of any algorithms that use a min-queue and a max-queue separately. - Data.PQueue adds the alias type PQueue = MinQueue, so that the "default" behavior is a min-queue. These design decisions seem to be sufficient to handle most traditional uses for priority queues. In particular, this approach offers a superset of the functionality offered by Java's built-in priority queue implementationhttp://java.sun.com/javase/6/docs/api/java/util/PriorityQueue.html, which makes the same design decisions, but of course, is all imperative and yucky! (Also, it offers inferior asymptotics, heh.) I made a particular effort to offer the sort of utility functions that are found in the other modules of containers. In particular, it offers: - take, takeWhile, span, and that whole family of functions. take k q returns the *list* of the top k elements, and drop k q returns the *queue* with the first k elements deleted. The rest of these methods have analogous signatures. - q !! k is equivalent to toAscList q !! k. - filter and partition are offered in O(n) time. (It's actually not obvious that my implementation actually runs in O(n) time, but I managed to prove it.) - We offer Functor, Foldable, and Traversable instances that do not respect key ordering. All are linear time, but Functor and Traversable in particular assume the function is monotonic. The Foldable instance is a good way to access the elements of the priority queue in an unordered fashion. (We also export mapMonotonic and traverseMonotonic, and encourage the use of those functions instead of the Functor or Traversable instances.) - We offer foldrAsc, foldrDesc, foldlAsc, and foldlDesc. (Descending-order operations are just implemented as duals of the ascending-order operations, for MinQueue. For MaxQueue, it's the other way around.) - Correspondingly, we export toList, toAscList, toDescList, fromList, fromAscList, fromDescList. (toList returns an *unordered* traversal, and is *not* equivalent to toAscList.) I'm really satisfied with the patch as-is, modulo maybe tinkering with the code style a little. I'm also working on an article for TMR on priority queues in Haskell, some of the different structures we considered, and particularly the new type-safety implementation I came up with for binomial heaps in the writing of this implementation. In conclusion, I want to be sure people actually like this approach! So check it out. Complaints are appreciated, but even "I think your implementation is absolutely perfect" would reassure me. =) Louis Wasserman wasserman.louis@gmail.com http://profiles.google.com/wasserman.louis

On March 16, 2010 09:29:06 Louis Wasserman wrote:
I'd like to request some more feedback on the proposedhttp://hackage.haskell.org/trac/ghc/ticket/3909implementation for priority queues in containers. Mostly, I feel like adding a new module to containers should be contentious, and there hasn't been as much griping or contention as I expected. The silence is feeling kind of eerie!
Not sure if this is an appropriate question at all as I haven't looked at the code, but would it be possible to put any primary functionality into a class. I'm thinking something along the lines of how the vector code works. This allows you to use all the higher-order functions (i.e., those implemented using the primary functions) on a different underlying implementation. I've found this particularly useful in wrapping Perl data types. For the Perl array, all I had to do was write an class instance for the vector module, and I have all these higher-order functions I could use from existing code. It would be very nice to have had something similar to do for the hash tables. Even to just provide a "native haskell" immutable look into the data so Haskell code can extract the components it needs with standard functions. Cheers! -Tyson PS: I'm still working on the wrapping, so I might change my mind as to how useful this really is, but thought I should throw it out there.

I'm not willing to do this sort of typeclass wrapper thing, primarily
because nothing else in containers does -- even though we might have a
Mapping type class that handles both IntMap and Map, we don't.
I'm inclined to let that design choice stand, as far as containers is
concerned. It would make perfect sense to write a new package with such a
type class and offering instances for the containers priority queue
implementations, but I prefer to stick with the style that containers
already seems to use -- that is, exporting separate modules without a
unifying type class, but with nearly-identical method signatures.
Louis Wasserman
wasserman.louis@gmail.com
http://profiles.google.com/wasserman.louis
On Tue, Mar 16, 2010 at 11:10 AM, Tyson Whitehead
On March 16, 2010 09:29:06 Louis Wasserman wrote:
I'd like to request some more feedback on the proposedhttp://hackage.haskell.org/trac/ghc/ticket/3909implementation for priority queues in containers. Mostly, I feel like adding a new module to containers should be contentious, and there hasn't been as much griping or contention as I expected. The silence is feeling kind of eerie!
Not sure if this is an appropriate question at all as I haven't looked at the code, but would it be possible to put any primary functionality into a class.
I'm thinking something along the lines of how the vector code works. This allows you to use all the higher-order functions (i.e., those implemented using the primary functions) on a different underlying implementation.
I've found this particularly useful in wrapping Perl data types. For the Perl array, all I had to do was write an class instance for the vector module, and I have all these higher-order functions I could use from existing code.
It would be very nice to have had something similar to do for the hash tables. Even to just provide a "native haskell" immutable look into the data so Haskell code can extract the components it needs with standard functions.
Cheers! -Tyson
PS: I'm still working on the wrapping, so I might change my mind as to how useful this really is, but thought I should throw it out there.

Hi, I second that choice. There have been some attempts at using type classes, notably the Edison library, but it never got widespread. So I would follow the current containers design. Milan
I'm not willing to do this sort of typeclass wrapper thing, primarily because nothing else in containers does -- even though we might have a Mapping type class that handles both IntMap and Map, we don't.
I'm inclined to let that design choice stand, as far as containers is concerned. It would make perfect sense to write a new package with such a type class and offering instances for the containers priority queue implementations, but I prefer to stick with the style that containers already seems to use -- that is, exporting separate modules without a unifying type class, but with nearly-identical method signatures.
Louis Wasserman wasserman.louis@gmail.com http://profiles.google.com/wasserman.louis
On Tue, Mar 16, 2010 at 11:10 AM, Tyson Whitehead
wrote: On March 16, 2010 09:29:06 Louis Wasserman wrote:
I'd like to request some more feedback on the proposedhttp://hackage.haskell.org/trac/ghc/ticket/3909implementation for priority queues in containers. Mostly, I feel like adding a new module to containers should be contentious, and there hasn't been as much griping or contention as I expected. The silence is feeling kind of eerie!
Not sure if this is an appropriate question at all as I haven't looked at the code, but would it be possible to put any primary functionality into a class.
I'm thinking something along the lines of how the vector code works. This allows you to use all the higher-order functions (i.e., those implemented using the primary functions) on a different underlying implementation.
I've found this particularly useful in wrapping Perl data types. For the Perl array, all I had to do was write an class instance for the vector module, and I have all these higher-order functions I could use from existing code.
It would be very nice to have had something similar to do for the hash tables. Even to just provide a "native haskell" immutable look into the data so Haskell code can extract the components it needs with standard functions.
Cheers! -Tyson
PS: I'm still working on the wrapping, so I might change my mind as to how useful this really is, but thought I should throw it out there.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Louis Wasserman wrote:
I'm not willing to do this sort of typeclass wrapper thing, primarily because nothing else in containers does -- even though we might have a Mapping type class that handles both IntMap and Map, we don't.
I'm inclined to let that design choice stand, as far as containers is concerned. It would make perfect sense to write a new package with such a type class and offering instances for the containers priority queue implementations, but I prefer to stick with the style that containers already seems to use -- that is, exporting separate modules without a unifying type class, but with nearly-identical method signatures.
Just an aside (and shameless plug ;-): Since the signatures overlap so much, it is in fact easy to wrap these modules into instances for many possible different type classes that one might consider using for containers --- I have a tool that mechanises this instance generation, available at: http://sqrl.mcmaster.ca/~kahl/Haskell/ModuleTools/ More about this in the forthcoming TFP 2009 proceedings paper: @InCollection{Kahl-2009_TFP, author = {Wolfram Kahl}, title = {Haskell Module Tools for Liberating Type Class Design}, crossref = {TFP2009}, pages = {129--144}, chapter = {9}, abstract = {Design of Haskell type class hierarchies for complex purposes, including for standard containers, is a non-trivial exercise, and evolution of such designs is additionally hampered by the large overhead of connecting to existing implementations. We systematically discuss this overhead, and propose a tool solution, implemented using the GHC API, to automate its generation.} } @Book{TFP2009, title = {Trends in Functional Programming, {TFP 2009}}, booktitle = {Trends in Functional Programming, {TFP 2009}}, year = 2010, editor = {Zolt\'an Horv{\'a}th and Vikt\'oia Zs{\'o}k and Peter Achten and Pieter Koopman}, address = {UK}, publisher = {Intellect}, note = {(In press)} } Wolfram

So, it is possible to break the invariants of your priority queue by
using fmap with a non-monotonic function, right? I see that it might be usefull to have such instances, sometimes. As it is not possible to hide instances, once they are definded, I'd propose to not offer those instances by default. Instead you could provide implementations of all the instance functions necessary to define this instances yourself. Or one could have a newtype wrapper around the priority queue for which instances of Function, Foldable, and Traversable are defined. This would allow to "activate" the nice instances on demand but to stay safe by default. Hmmmm. I suggest: - Functor and Traversable should be modified as you suggest, that is, we continue exporting mapMonotonic and traverseMonotonic, but don't export Functor or Traversable instances. - I'm still going to insist that we retain Foldable. The most important reason is that we don't lose any invariants as a result of a fold, and the second reason is that reexporting functions named "foldr" and "foldl" would be awkward. Making this change now. Louis Wasserman wasserman.louis@gmail.com http://profiles.google.com/wasserman.louis

On March 16, 2010 13:24:32 kahl@cas.mcmaster.ca wrote:
Just an aside (and shameless plug ;-): Since the signatures overlap so much, it is in fact easy to wrap these modules into instances for many possible different type classes
Yes. I can see the preferred way would be to put the classes on top of the implementation, instead of the implementations on top of the classes. Cheers! -Tyson

On 17/03/2010, at 03:16, Louis Wasserman wrote:
I'm not willing to do this sort of typeclass wrapper thing, primarily because nothing else in containers does -- even though we might have a Mapping type class that handles both IntMap and Map, we don't.
I'm inclined to let that design choice stand, as far as containers is concerned. It would make perfect sense to write a new package with such a type class and offering instances for the containers priority queue implementations, but I prefer to stick with the style that containers already seems to use -- that is, exporting separate modules without a unifying type class, but with nearly-identical method signatures.
FWIW, vector does both. It defines most vector operations generically and then exports appropriate specialisations for each concrete vector type. I think this is the most flexible and convenient approach. I just wish Haskell had some kind of support for it. Roman

Specifically, we use a binomial heap, which offers
O(log n) worst-case union O(log n) worst-case extract (this in particular was a key objective of ours) amortized O(1), worst-case O(log n) insertion. (In a persistent context, the amortized performance bound does not necessarily hold.)
Why not use Okasaki & Brodal's "Optimal Purely Functional Priority Queues"? They offer worst case: * O(1) union, findMin, and insert * O(lg n) deleteMin http://www.eecs.usma.edu/webs/people/okasaki/jfp96/index.html ftp://ftp.daimi.au.dk/pub/BRICS/Reports/RS/96/37/BRICS-RS-96-37.pdf

Hi, I think this is my first post to this list although I've been reading it for a long time. So, please, be patient with me and my post. On 03/16/2010 02:29 PM, Louis Wasserman wrote:
* We offer Functor, Foldable, and Traversable instances that do not respect key ordering. All are linear time, but Functor and Traversable in particular assume the function is monotonic. The Foldable instance is a good way to access the elements of the priority queue in an unordered fashion. (We also export mapMonotonic and traverseMonotonic, and encourage the use of those functions instead of the Functor or Traversable instances.)
So, it is possible to break the invariants of your priority queue by using fmap with a non-monotonic function, right? I see that it might be usefull to have such instances, sometimes. As it is not possible to hide instances, once they are definded, I'd propose to not offer those instances by default. Instead you could provide implementations of all the instance functions necessary to define this instances yourself. Or one could have a newtype wrapper around the priority queue for which instances of Function, Foldable, and Traversable are defined. This would allow to "activate" the nice instances on demand but to stay safe by default. Best regards, Jean-Marie
participants (7)
-
Jean-Marie Gaillourdet
-
Jim Apple
-
kahl@cas.mcmaster.ca
-
Louis Wasserman
-
Milan Straka
-
Roman Leshchinskiy
-
Tyson Whitehead