RE: [Haskell] RFC: DData in hierarchical libraries

I propose to add a modified version of DData to the hierachical libraries.
DData is a concrete library of collection types, by Daan Leijen. My modifications intend to make DData fit better in the hierarchical libraries.
The haddock-generated documentation can be found here:
http://users.skynet.be/jyp/DData/doc/index.html
while the source code is at
Regarding the Seq type, I like the one that we use in GHC. See: http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/utils/Ord List.lhs Both libraries can support the same interface, but the implementation in GHC's OrdList library is based on a concrete datatype: data OrdList a = Many [a] | Two (OrdList a) (OrdList a) | One a | None which is more flexible, and probably more efficient. Indeed for performance reasons you might want two more constructors: | Cons a (OrdList a) | Snoc (OrdList a) a I'd like to see many more operations defined over Seqs. At least: snoc :: Seq a -> a -> Seq a concat :: Seq (Seq a) -> Seq a and probably many other functions from Data.List. ------------- Looking briefly at the rest of the library, I have a few questions/comments: Why are there IntBags but no Bags? How do DData.Map and DData.IntMap compare in performance to the existing Data.FiniteMap? Similarly for Sets? Can IntMap and IntSet be implemented by specialisation or overloading the Set type (ala the overloaded arrays in Data.Array)? (The interfaces look very similar, but I didn't compare them in detail). Why is there no direct translation between Seqs and Sets/Maps, but there is for lists? I guess there's a consistency issue here - do we want to (a) use lists exclusively, (b) use Seq exclusively, or (c) have all operations available for both types? Internal functions like DData.Set.valid should not be visible. Cheers, Simon

Simon Marlow wrote:
Why are there IntBags but no Bags?
As I've said before, rename MultiSet to Bag.
Internal functions like DData.Set.valid should not be visible.
"valid" may be useful for debugging purposes if certain (for efficiency reasons unsafe) functions like "fromDistinctAscList" may produce invalid representations (if the argument is not strictly ascending). But I don't know if this is the case. Christian

Hi,
--- Simon Marlow
Regarding the Seq type, I like the one that we use in GHC. See: [URL]
Both libraries can support the same interface, but the implementation in GHC's OrdList library is based on a concrete datatype:
data OrdList a = Many [a] | Two (OrdList a) (OrdList a) | One a | None
which is more flexible, and probably more efficient. Indeed for performance reasons you might want two more constructors:
| Cons a (OrdList a) | Snoc (OrdList a) a
I'd like to see many more operations defined over Seqs. At least:
snoc :: Seq a -> a -> Seq a concat :: Seq (Seq a) -> Seq a
and probably many other functions from Data.List.
I've done this, mostly. I'll submit another revision soon.
-------------
Looking briefly at the rest of the library, I have a few questions/comments:
Why are there IntBags but no Bags?
How do DData.Map and DData.IntMap compare in performance to the existing Data.FiniteMap? Similarly for Sets?
I'd like to hear Daan's opinion about this...
Can IntMap and IntSet be implemented by specialisation or overloading the Set type (ala the overloaded arrays in Data.Array)?
It is intended to be so. As it is, a few functions are missing, but nothing that can't be fixed easily.
Why is there no direct translation between Seqs and Sets/Maps, but there is for lists? I guess there's a consistency issue here - do we want to (a) use lists exclusively, (b) use Seq exclusively, or (c) have all operations available for both types?
My view is that lists cannot be done away with; they are pervasive in the current "haskell coding style". I suspect there would be performance issues as well. I used to consider "Seqs" for the sole purpose of concatenation. This implies to use lists everywhere, leaving the conversion to sequences to the programmer, if ever that's needed. Thanks for your feedback, JP. __________________________________ Do you Yahoo!? Yahoo! Search - Find what you�re looking for faster http://search.yahoo.com

Hi all,
On Mon, 8 Mar 2004 16:58:13 -0000, Simon Marlow
Regarding the Seq type, I like the one that we use in GHC. See:
The "Seq" and "Queue" modules are less "mature" than the Map/Set/Bag modules -- it may be good if someone consistently extends them with the GHC modules as examples.
How do DData.Map and DData.IntMap compare in performance to the existing Data.FiniteMap? Similarly for Sets?
In terms of worst-case bounds, DData.Map equals Data.FiniteMap. DData.IntMap has an incomparable worst-case bound than Data.FiniteMap I *suspect* that the absolute efficiency of DData.Map will be close to Data.FiniteMap as they use essentially the same algorithms and representation. On some operations DData.Map will be better as it uses "hedge" variations. However, I have never tested this with proper benchmarks. (now that I think about it, I might have tested it with improper benchmarks... I'll try to find on my computer when I get home). I have measured different DData.IntMap against DData.Map instantiated with Int's and it strongly outperformed it on all operations. (I'll give numbers if I can find them again). This was my motivation for including a specialized implementation based on (big-endian) patricia trees.
Can IntMap and IntSet be implemented by specialisation or overloading the Set type (ala the overloaded arrays in Data.Array)? (The interfaces look very similar, but I didn't compare them in detail).
IntMap/IntSet have completely different algorithms than the corresponding Map/Set modules. The interfaces are very similar off course :-) I think that making them an instance of a common "Set" type should not be goal of DData but rather a goal of a framework like Edison, for which DData can provide concrete implementations.
Why is there no direct translation between Seqs and Sets/Maps, but there is for lists? I guess there's a consistency issue here - do we want to (a) use lists exclusively, (b) use Seq exclusively, or (c) have all operations available for both types?
I personally prefer to use lists exclusively, and to make the "seqFromList" function really efficient, in the sense that we guarantee that "listFromSeq . seqFromList" takes constant time. All the best, Daan.
participants (4)
-
Christian Maeder
-
Daan Leijen
-
JP Bernardy
-
Simon Marlow