
Hi all, As we consider what we want to go into the Haskell Platform, I would like to seriously discuss the issue of container polymorphism, especially, but not exclusively, in dealing with strings. We currently have a large number of container types, many of which are going to co-exist long-term, i.e. there is no winner; they are useful in different situations. Here is an incomplete list of what we are dealing with: Fully polymorphic sequence types: [] Data.Sequence Data.Edison sequence types Data.Array.Array & friends Restricted polymorphic sequence types: uvector vector storable-vector Monomorphic sequence types: ByteString (Word8 and Char8) Lazy ByteString (Word8 and Char8) Text Fully polymorphic associative array types: Data.Map Edison associative arrays gmap (?) Less-polymorphic associative array types: bytestring-tries list-tries Data.IntMap I'm sure there are more, of course, but those are the ones that a cursory search reveals. The question, then, is how we ought to deal with all of these types in the Haskell Platform. The current popular libraries deal with the multitude of container types in a very ad-hoc way. For instance, the Parsec parsing library allows you to parse Strings and Strict ByteStrings; it also includes a class so that you can parse other things. Attoparsec allows you to parse only lazy bytestrings. I don't know of any library that parses Text yet. The binary library deals with lazy bytestrings, while binary-strict lets you use strict bytestrings. Libraries like zlib just pick a type (lazy bytestrings) and use that, not accounting for other possible types you may want to use (granted, in zlib's case, the choice of a single data structure, seems fairly reasonable, given potential use cases). The point here is that the same functionality and practically the same code can be split over several different packages based on what data structure you want to use. Worse, there is no standard way of dealing with this separation: Parsec uses a type class, while binary uses two separate packages. And interoperability could potentially become a problem: the split library, for instance, only supports lists, so you have to roll your own solutions for splitting ByteStrings or Text. I think that with the introduction of the Haskell Platform, we ought to take the opportunity to clean up some of this chaos so that we have a clean set of standard libraries for people to build on. The way I see it, we have a few options. We could introduce some type classes to cover the various container operations and ask all Platform packages to only use operations from these classes, where applicable. (The ListLike library would probably be a good place to start.) The issue here is efficiency, massive class dictionaries, and type system issues (we need MPTCs or TFs; also, since not all sequence types are polymorphic, we probably need to have separate map and rigidMap functions like in ListLike, and even that doesn't work for types like uvector that can take some, but not all, concrete element types). We could also implement a standard module-naming scheme: for every package that operates on strings, for instance, we could require the list-based code to go in Foo.Bar.String and the Text code to go in Foo.Bar.Text. Whenever we "blessed" a new string datatype (presumably not very often), all packages would have to add a new module. The issue is code duplication. We could also implement a package naming scheme: have foo-text and foo-string for the two different datatypes. What does everyone think about the need for standardization here and how we ought to do it? I'm sorry that, having less experience with Haskell than many, I can't give as much in the way of concrete proposals, but I think it's important that we hash something out here to get more organization in our Platform. Thanks for your time in reading this admittedly long message, Alex