
Hello Folks, As some of you will be aware, I have been working on various Map implementations that currently live here.. http://code.haskell.org/collections/collections-ghc6.8 The libs in question being Data.Tree.AVL, Data.Trie.General and a few other bits like Data.COrdering and the AVL based Data.Map/Set clones. Well, I have decided to stop work on these. So they need a new owner if they're going to go anywhere. If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy. Of course I will be happy to provide any help or advise anyone who takes over these libs may feel they need from me. I might even contribute a few patches from time to time myself :-) Thanks -- Adrian Hey

ahey:
Hello Folks,
As some of you will be aware, I have been working on various Map implementations that currently live here..
http://code.haskell.org/collections/collections-ghc6.8
The libs in question being Data.Tree.AVL, Data.Trie.General and a few other bits like Data.COrdering and the AVL based Data.Map/Set clones.
Well, I have decided to stop work on these. So they need a new owner if they're going to go anywhere. If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy.
Of course I will be happy to provide any help or advise anyone who takes over these libs may feel they need from me. I might even contribute a few patches from time to time myself :-)
What are the long term goals of the project? To merge key data structures into the containers library? -- Don

On Nov 25, 2007 9:23 PM, Don Stewart
What are the long term goals of the project? To merge key data structures into the containers library?
They are a more like a replacement, with (almost) the same api, and better performance (last time we checked). Besides, I maintain a common class framework API to use (and test, etc.) all those beasts uniformly. Cheers, -- JP

bernardy:
On Nov 25, 2007 9:23 PM, Don Stewart
wrote: What are the long term goals of the project? To merge key data structures into the containers library?
They are a more like a replacement, with (almost) the same api, and better performance (last time we checked).
Besides, I maintain a common class framework API to use (and test, etc.) all those beasts uniformly.
Can you give an overview of which data structures we now have better replacements for, and what is preventing them being replaced? Perhaps a good next step would be to break containers up into smaller pieces, so that individual parts of the containers library can be selectively replaced? Once we have experience with what breaks, and what is faster, then we can proceed to replace components? -- Don

bernardy:
On Nov 25, 2007 9:23 PM, Don Stewart
wrote: What are the long term goals of the project? To merge key data structures into the containers library?
They are a more like a replacement, with (almost) the same api, and better performance (last time we checked).
Besides, I maintain a common class framework API to use (and test, etc.) all those beasts uniformly.
I'm tentatively proposing that a couple of people at Galois could take over this, given some direction on what potential this library has. My view would be that we: * break the package up into isolated units (i.e. queues, trees, ..) * upload the pieces ot hackage * as we gain experience with it, replac/add things to the base and containers package -- Don

On Nov 25, 2007 10:35 PM, Don Stewart
I'm tentatively proposing that a couple of people at Galois could take over this, given some direction on what potential this library has. My view would be that we:
* break the package up into isolated units (i.e. queues, trees, ..) * upload the pieces ot hackage * as we gain experience with it, replac/add things to the base and containers package
This is exactly what I was planning (and indeed first step is partially done). Unfortunately I have next to zero time to invest in this at the moment. So if anyone wishes to proceed with that they are more than welcome :) --JP

Hello Don, Just a note to answer a few questions about what the plan is/was for these libs, as I see it at least. Of course anyone taking it over is free to set their own agenda :-) The collections API was intended to provide a way for different data structure implementations to look the same, so we could use common test and benchmarking infrastructure and allow users to switch from one to another without substantial code re-writes etc.. Data.COrdering is trivial, and only really exists for Data.Tree.AVL. It was organised as a separate package as it is not dependent of AVL trees and might possibly be useful elsewhere, despite its triviality. Data.Tree.AVL is just an "uncommited" AVL tree lib. It's reasonably mature and well tested and has a comprehensive but ugly and unsafe API. It was designed with performance in mind, rather than elegance. There are Data.Map and Data.Set (near) clones that put a safer gloss on the raw AVL API. Last time I tested them they seem to offer some performance gains of the "standard" Data.Map and Data.Set, but nothing particularly dramatic unless you fall foul of "pathological" situations in Data.Map/Set (the tree type used in these doesn't seem to perform well with sorted insertions). Also, the Hedge algorithm is rather expensive in terms of comparisons. It seems to have been designed on the assumption that comparison was cheap and tree manipulation was expensive. But with AVL trees tree manipulation is rather cheap and the cost of comparison is unknown, so this uses a symetric divide and conquer algorithm. This seems to perform rather well, even if comparison is cheap (Ints). But the real advantage of the clones is that they contain stuff that Data.Map/Set don't at present, like the "cheap zipper" OMap type. But... All of the above is essentially obsolete as far as I'm concerned. The real way ahead for efficient Maps for arbitrarily complex key structures has to be Generalised Tries, which is where I've been putting most of my effort. What's there at the moment is the Data.Trie.General package, which defines the GT class and a few hand written instances thereof for common prelude types (notable Lists,Ints and any Ord instance). This was really an experiment to see what a "complete" (and in reality useable) class API would look like and what efficiency problems real implementations might have to address. It's not really ready for prime time yet. The class API in probably still incomplete and none of it has even been tested or benchmarked, and (most importantly) there's not yet any way to produce instances automatically (doing it by hand is an awful lot of work). So what really needs doing AFAICS (and there's a lot for just me and Jean-Philippe) is (at least) .. * Write test and benchmark suites for various collections classes. * Try to establish some common ground between collections classes and Edison. * Redefine the Generalised Trie (GT) class to use those new fangled associated type thingies. * Try to stabilise the Generalised Trie class (or classes) and resolve issues like the IMO useful but contentious Ord constraint. * Provide some way of generating efficient instances automatically (using Template Haskell or DrIFT or Derive or whatever). * Other stuff that I can't think of right now. I would also like to add a note of caution for the benefit of anyone taking this on. In principle Generalised Tries are simple enough for sum and product types (e.g. the Hinze papers). But in reality things are not so straight forward or simple if you're concerned with efficiency. The hand written GT instance for Lists looks nothing like what you'd get from the simple sum and product rules, for example (for good reason). Also, the Data.Trie.General.CollectionsInstances package is probably complete garbage and should be ignored :-( BTW, I haven't completely lost interest in this myself, it's just that as with Jean-Philippe, I have other stuff I want to do and it also has to be a part time effort for me, depending on what other commitments I may have. So if galois wanted to put some effort into this that would be great, as decent performing collections (especially maps) are quite important I think. Regards -- Adrian Hey

Hello Folks, Adrian Hey wrote:
If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy.
Sigh..no sooner than I go and write something like that than the IEE (or I should say IET) go and break my mail alias. So sorry if anyone did actually try to contact me and got a their mail bounced. It should be working again now. Regards -- Adrian Hey

ahey:
Hello Folks,
As some of you will be aware, I have been working on various Map implementations that currently live here..
http://code.haskell.org/collections/collections-ghc6.8
The libs in question being Data.Tree.AVL, Data.Trie.General and a few other bits like Data.COrdering and the AVL based Data.Map/Set clones.
Well, I have decided to stop work on these. So they need a new owner if they're going to go anywhere. If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy.
Of course I will be happy to provide any help or advise anyone who takes over these libs may feel they need from me. I might even contribute a few patches from time to time myself :-)
Thanks
How about we propose this work be done for Summer of Code. I've created a ticket here: http://hackage.haskell.org/trac/summer-of-code/ticket/1549 Add comments, or if you can mentor, add that information too! Let's get a new faster Data.Map and other containers ready to go by the end of the northern summer? -- Don

On Monday 24 March 2008, Don Stewart wrote:
Let's get a new faster Data.Map and other containers ready to go by the end of the northern summer?
And while we are visiting this, can I put in a vote for a seperation between the default Data.* container concrete implementations and associated classes? E.g. between the cannonical Map implementation and a Map class that could (if I felt mad) be backed by a list, or (if I was more sane) the cannonical Map datatype? This would go /at least/ for Map, Set, List. Matthew
-- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Don Stewart wrote:
ahey:
Hello Folks,
As some of you will be aware, I have been working on various Map implementations that currently live here..
http://code.haskell.org/collections/collections-ghc6.8
The libs in question being Data.Tree.AVL, Data.Trie.General and a few other bits like Data.COrdering and the AVL based Data.Map/Set clones.
Well, I have decided to stop work on these. So they need a new owner if they're going to go anywhere. If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy.
Of course I will be happy to provide any help or advise anyone who takes over these libs may feel they need from me. I might even contribute a few patches from time to time myself :-)
Thanks
How about we propose this work be done for Summer of Code.
I've created a ticket here:
http://hackage.haskell.org/trac/summer-of-code/ticket/1549
Add comments, or if you can mentor, add that information too!
Let's get a new faster Data.Map and other containers ready to go by the end of the northern summer?
Hello Don, I'm not sure what you're proposing as the SOC project, but I don't think getting AVL based Data.Map/Set clones in Hackage is particularly suitable or challenging. This work is 99% done and also quite boring. It could be finished by the end of today if I set my mind to it :-) There are 3 significant things that really need doing IMO. 1- Try to reconcile the apparent differences between the collections package and Edison class APIs. I don't really understand either myself, but having multiple classes for "the same" things makes both implementors and test suite writers lives harder. The generalised trie class "GT" should still be kept separate as it needs some weird class methods in order to make new instances from old and can't really be finalised until this type families stuff is available anyway. 2- Write a polymorphic test and benchmarking suite for sets, maps, sequences etc. This would be really useful and is something that could reasonably be done as SOC project. But it may also be little boring :-( This could be based on classes defined as a result of 1 (above), or failing that the author would have to define yet another set of class APIs for test/benchmarking only. This may be the simpler approach as it doesn't assume anything about Edison or collections abstractions (it's just a way of testing concrete data structure implementations). 3- Produce some way of automatically deriving (efficient) generalised trie (GT) instances for user defined types. The API is huge and complex (and likely to get bigger still), so hand writing instances really isn't realistic in the long run. But this may be a bit premature for SOC as the GT class API itself is not yet complete or stable, and probably won't be until type families are available (and tested and documented and someone takes the trouble to finish it). So maybe this is something for next years SOC? That said, I know that type families are provisionally available, so maybe doing something with generalised tries might be possible. I don't mind mentoring anyone who wants to do something with any of this. Regards -- Adrian Hey

ahey:
Don Stewart wrote:
ahey:
Hello Folks,
As some of you will be aware, I have been working on various Map implementations that currently live here..
http://code.haskell.org/collections/collections-ghc6.8
The libs in question being Data.Tree.AVL, Data.Trie.General and a few other bits like Data.COrdering and the AVL based Data.Map/Set clones.
Well, I have decided to stop work on these. So they need a new owner if they're going to go anywhere. If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy.
Of course I will be happy to provide any help or advise anyone who takes over these libs may feel they need from me. I might even contribute a few patches from time to time myself :-)
Thanks
How about we propose this work be done for Summer of Code.
I've created a ticket here:
http://hackage.haskell.org/trac/summer-of-code/ticket/1549
Add comments, or if you can mentor, add that information too!
Let's get a new faster Data.Map and other containers ready to go by the end of the northern summer?
Hello Don,
I'm not sure what you're proposing as the SOC project, but I don't think getting AVL based Data.Map/Set clones in Hackage is particularly suitable or challenging. This work is 99% done and also quite boring. It could be finished by the end of today if I set my mind to it :-)
I was more putting it in as a stub for you to round out to a full SoC proposal to work on data structure things.. :)
There are 3 significant things that really need doing IMO. 1- Try to reconcile the apparent differences between the collections package and Edison class APIs. I don't really understand either myself, but having multiple classes for "the same" things makes both implementors and test suite writers lives harder. The generalised trie class "GT" should still be kept separate as it needs some weird class methods in order to make new instances from old and can't really be finalised until this type families stuff is available anyway.
2- Write a polymorphic test and benchmarking suite for sets, maps, sequences etc. This would be really useful and is something that could reasonably be done as SOC project. But it may also be little boring :-( This could be based on classes defined as a result of 1 (above), or failing that the author would have to define yet another set of class APIs for test/benchmarking only. This may be the simpler approach as it doesn't assume anything about Edison or collections abstractions (it's just a way of testing concrete data structure implementations).
3- Produce some way of automatically deriving (efficient) generalised trie (GT) instances for user defined types. The API is huge and complex (and likely to get bigger still), so hand writing instances really isn't realistic in the long run. But this may be a bit premature for SOC as the GT class API itself is not yet complete or stable, and probably won't be until type families are available (and tested and documented and someone takes the trouble to finish it). So maybe this is something for next years SOC?
That said, I know that type families are provisionally available, so maybe doing something with generalised tries might be possible. I don't mind mentoring anyone who wants to do something with any of this.
Great! Would you like to revise the Soc ticket, with this information? Getting some usable generalised tries available would be a great result. -- Don

Don Stewart wrote:
That said, I know that type families are provisionally available, so maybe doing something with generalised tries might be possible. I don't mind mentoring anyone who wants to do something with any of this.
Great! Would you like to revise the Soc ticket, with this information?
Getting some usable generalised tries available would be a great result.
Hello Don, I created a new Generalised Tries specific ticket.. http://hackage.haskell.org/trac/summer-of-code/ticket/1560 Do I need to do more? (not really sure how the whole process works). Who decides if proposals are good/ok/bad? Not me I assume :-) Regards -- Adrian Hey
participants (4)
-
Adrian Hey
-
Don Stewart
-
Jean-Philippe Bernardy
-
Matthew Pocock