Containers in the Haskell Platform

6 Aug 2009

      Hi all,

As we consider what we want to go into the Haskell Platform, I would
like to seriously discuss the issue of container polymorphism,
especially, but not exclusively, in dealing with strings. We currently
have a large number of container types, many of which are going to
co-exist long-term, i.e. there is no winner; they are useful in
different situations. Here is an incomplete list of what we are
dealing with:

Fully polymorphic sequence types:
[]
Data.Sequence
Data.Edison sequence types
Data.Array.Array & friends

Restricted polymorphic sequence types:
uvector
vector
storable-vector

Monomorphic sequence types:
ByteString (Word8 and Char8)
Lazy ByteString (Word8 and Char8)
Text

Fully polymorphic associative array types:
Data.Map
Edison associative arrays
gmap (?)

Less-polymorphic associative array types:
bytestring-tries
list-tries
Data.IntMap

I'm sure there are more, of course, but those are the ones that a
cursory search reveals.

The question, then, is how we ought to deal with all of these types in
the Haskell Platform. The current popular libraries deal with the
multitude of container types in a very ad-hoc way. For instance, the
Parsec parsing library allows you to parse Strings and Strict
ByteStrings; it also includes a class so that you can parse other
things. Attoparsec allows you to parse only lazy bytestrings. I don't
know of any library that parses Text yet. The binary library deals
with lazy bytestrings, while binary-strict lets you use strict
bytestrings. Libraries like zlib just pick a type (lazy bytestrings)
and use that, not accounting for other possible types you may want to
use (granted, in zlib's case, the choice of a single data structure,
seems fairly reasonable, given potential use cases).

The point here is that the same functionality and practically the same
code can be split over several different packages based on what data
structure you want to use. Worse, there is no standard way of dealing
with this separation: Parsec uses a type class, while binary uses two
separate packages. And interoperability could potentially become a
problem: the split library, for instance, only supports lists, so you
have to roll your own solutions for splitting ByteStrings or Text. I
think that with the introduction of the Haskell Platform, we ought to
take the opportunity to clean up some of this chaos so that we have a
clean set of standard libraries for people to build on.

The way I see it, we have a few options. We could introduce some type
classes to cover the various container operations and ask all Platform
packages to only use operations from these classes, where applicable.
(The ListLike library would probably be a good place to start.) The
issue here is efficiency, massive class dictionaries, and type system
issues (we need MPTCs or TFs; also, since not all sequence types are
polymorphic, we probably need to have separate map and rigidMap
functions like in ListLike, and even that doesn't work for types like
uvector that can take some, but not all, concrete element types).

We could also implement a standard module-naming scheme: for every
package that operates on strings, for instance, we could require the
list-based code to go in Foo.Bar.String and the Text code to go in
Foo.Bar.Text. Whenever we "blessed" a new string datatype (presumably
not very often), all packages would have to add a new module. The
issue is code duplication.

We could also implement a package naming scheme: have foo-text and
foo-string for the two different datatypes.

What does everyone think about the need for standardization here and
how we ought to do it? I'm sorry that, having less experience with
Haskell than many, I can't give as much in the way of concrete
proposals, but I think it's important that we hash something out here
to get more organization in our Platform.

Thanks for your time in reading this admittedly long message,
Alex

Containers in the Haskell Platform

Alexander Dunlap