
Hello. I am hoping to take on the Data Structures project proposed two years ago by Don Stewart herehttp://hackage.haskell.org/trac/summer-of-code/ticket/1549, this summer. Before I write up my proposal to Google, I wanted to gauge the reaction of the Haskell community to this project. Particularly: -What Data Structures in the current libraries are in most dire need of improvement? -How necessary do you think a Containers Library revision is? -Should I attempt to build on the work Jamie Brandon did with Map as generalised tries, or is that beyond the scope of this project I am very excited to work with functional data structures, and I would greatly appreciate any input. -Nathan Hunter

Nathan Hunter
-What Data Structures in the current libraries are in most dire need of improvement? -How necessary do you think a Containers Library revision is? -Should I attempt to build on the work Jamie Brandon did with Map as generalised tries, or is that beyond the scope of this project
As I see it, the most dire need is a unified interface to everything, as well as instance selection, think type instance MapOf Int = IntMap type instance MapOf (a,b,c) =Tuple3Map(MapOf a)(MapOf b)(MapOf c) a b c (gmap is very, very handy in every other aspect) We have a lot of useful interfaces (e.g. ListLike, Edison), but they don't seem to enjoy wide-spread popularity. There's some lack when it comes to hashtables (I'd like to see a Haskell implementation of perfect hashing, as that's sometimes jolly useful) as well as cache-obliviousness (which is a can of worms: Ensuring data layout requires using mutable Arrays, which sucks, so what we actually need is a gc/allocator that can be told to allocate in van Emde Boas layout), but in general, the implementation side is fine. I would also like to be able to match on the left head of Data.Sequence in the same way I can match on Lists (see e.g. [1]), but I guess there'd have to me more commutity backing for that to become reality. [1] http://hackage.haskell.org/trac/ghc/ticket/3583 -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

On 31/03/2010, at 18:14, Achim Schneider wrote:
We have a lot of useful interfaces (e.g. ListLike, Edison), but they don't seem to enjoy wide-spread popularity.
Perhaps that's an indication that we need different interfaces? IMO, huge classes which generalise every useful function we can think of just isn't the right approach. We need small interfaces between containers and algorithms. In fact, the situation is perhaps somewhat similar to C++ where by providing exactly that the STL has been able to replace OO-style collection libraries which never really worked all that well. Roman

Nathan, For what it is worth: I'd propose that Data.HashTable needs to be replaced; it appears to me having played around with it and found it wanting that its limitations are pretty common knowledge in the Haskell community. (I'm sure most people on this list would already know much more about the limitations of the existing HashTable library than do I, for am only new to Haskell, so sorry if my suggestion is an eyeball-roller). Darryn. On Wed, 2010-03-31 at 00:00 -0700, Nathan Hunter wrote:
Hello.
I am hoping to take on the Data Structures project proposed two years ago by Don Stewart here, this summer. Before I write up my proposal to Google, I wanted to gauge the reaction of the Haskell community to this project. Particularly: -What Data Structures in the current libraries are in most dire need of improvement? -How necessary do you think a Containers Library revision is? -Should I attempt to build on the work Jamie Brandon did with Map as generalised tries, or is that beyond the scope of this project I am very excited to work with functional data structures, and I would greatly appreciate any input.
-Nathan Hunter _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Nathan Hunter wrote:
Hello.
I am hoping to take on the Data Structures project proposed two years ago by Don Stewart herehttp://hackage.haskell.org/trac/summer-of-code/ticket/1549, this summer. Before I write up my proposal to Google, I wanted to gauge the reaction of the Haskell community to this project. Particularly:
-What Data Structures in the current libraries are in most dire need of improvement? -How necessary do you think a Containers Library revision is?
One thing I've seen come up repeatedly is the issue of presenting a unified and general interface for Data.Map, Data.IntMap, and related things like Data.Set, bytestring-trie, pqueue, etc which intended to mimic their interface. That alone isn't big enough for a GSoC, but it would be a very nice thing to have. Every few months there's a request on the libraries@ list to alter, generalize, or reunify the map interface in some way. ** Just to be clear, I do not mean coming up with a typeclass nor doing things like the generalized-trie tricks, I just mean a good old fashioned standard API. ** There are countervailing forces for making a good API. On the one hand we want functions to do whatever we need, on the other hand we want the API to be small enough to be usable/memorable. In the bytestring-trie library I attempted to resolve this conflict by offering a small set of highly efficient ueber-combinators in the internals module, a medium sized set of functions for standard use in the main module, and then pushed most everything else off into a convenience module. The containers library would do good to follow this sort of design. The Data.Map and Data.IntMap structures don't provide the necessary ueber-combinators, which has led to the proliferation of convenience functions which are more general than the standard use functions but not general enough to make the interface complete. Also, these generalized functions are implemented via code duplication rather than having a single implementation, which has been known to lead to cut&paste bugs and maintenance issues. Provided the correct ueber-combinators are chosen, there is no performance benefit for this code duplication either (so far as I've discovered with bytestring-trie). Additionally, it'd be nice if some of the guts were made available for public use (e.g., the bit-twiddling tricks of Data.IntMap so I don't have to duplicate them in bytestring-trie). Also it would be nice to develop a cohesive test and benchmarking suite, which would certainly be a large enough task for GSoC, though perhaps not fundable. I would be willing to co-mentor API and algorithm design for cleaning the cobwebs out of the containers library. I wouldn't have the time for mentoring the choice of datastructures or benchmarking however. -- Live well, ~wren

Well, one of my most important questions has been indirectly answered. It
seems like Map is still the main point of interest, and Jamie Brandon's list
of remaining objectiveshttp://www.haskell.org/pipermail/haskell-cafe/2009-March/058270.html
include
modifying the API to work with the edison or collections API, and
benchmarking. I haven't attempted a project this large before, so I'm not
sure what is feasible in the allotted time. A complete Data Structures
overhaul seems out of the question, but polishing up a single one seems far
too sparse. Then again, the benchmarking could represent a significant chunk
of work.
-Nathan
On Wed, Mar 31, 2010 at 11:28 AM, wren ng thornton
Nathan Hunter wrote:
Hello.
I am hoping to take on the Data Structures project proposed two years ago by Don Stewart here< http://hackage.haskell.org/trac/summer-of-code/ticket/1549>,
this summer. Before I write up my proposal to Google, I wanted to gauge the reaction of the Haskell community to this project. Particularly:
-What Data Structures in the current libraries are in most dire need of improvement? -How necessary do you think a Containers Library revision is?
One thing I've seen come up repeatedly is the issue of presenting a unified and general interface for Data.Map, Data.IntMap, and related things like Data.Set, bytestring-trie, pqueue, etc which intended to mimic their interface. That alone isn't big enough for a GSoC, but it would be a very nice thing to have. Every few months there's a request on the libraries@list to alter, generalize, or reunify the map interface in some way.
** Just to be clear, I do not mean coming up with a typeclass nor doing things like the generalized-trie tricks, I just mean a good old fashioned standard API. **
There are countervailing forces for making a good API. On the one hand we want functions to do whatever we need, on the other hand we want the API to be small enough to be usable/memorable. In the bytestring-trie library I attempted to resolve this conflict by offering a small set of highly efficient ueber-combinators in the internals module, a medium sized set of functions for standard use in the main module, and then pushed most everything else off into a convenience module.
The containers library would do good to follow this sort of design. The Data.Map and Data.IntMap structures don't provide the necessary ueber-combinators, which has led to the proliferation of convenience functions which are more general than the standard use functions but not general enough to make the interface complete. Also, these generalized functions are implemented via code duplication rather than having a single implementation, which has been known to lead to cut&paste bugs and maintenance issues. Provided the correct ueber-combinators are chosen, there is no performance benefit for this code duplication either (so far as I've discovered with bytestring-trie).
Additionally, it'd be nice if some of the guts were made available for public use (e.g., the bit-twiddling tricks of Data.IntMap so I don't have to duplicate them in bytestring-trie).
Also it would be nice to develop a cohesive test and benchmarking suite, which would certainly be a large enough task for GSoC, though perhaps not fundable.
I would be willing to co-mentor API and algorithm design for cleaning the cobwebs out of the containers library. I wouldn't have the time for mentoring the choice of datastructures or benchmarking however.
-- Live well, ~wren
participants (5)
-
Achim Schneider
-
Darryn Reid
-
Nathan Hunter
-
Roman Leshchinskiy
-
wren ng thornton