
[sorry for the somewhat longer rant, you may want to skip to the more technical questions at the end of the post] Twan van Laarhoven wrote:
I would like to announce version 0.3 of my Data.CompactString library. Data.CompactString is a wrapper around Data.ByteString that represents a Unicode string. This new version supports different encodings, as can be seen from the data type:
[...]
Homepage: http://twan.home.fmf.nl/compact-string/ Haddock: http://twan.home.fmf.nl/compact-string/doc/html/ Source: darcs get http://twan.home.fmf.nl/repos/compact-string
After taking a look at the Haddock docs, I was impressed by the amount of repetition in the APIs. Not ony does Data.CompactString duplicate the whole Data.ByteString interface (~100 functions, adding some more for encoding and decoding), the whole interface is again repeated another four times, once for each supported encoding. Now, this is /not/ meant as a criticism of the compact-string package in particular. To the contrary, duplicating a fat interface for almost identical functionality is apparently state-of-the-art in Haskell library design, viz. the celebrated Data.Bytesting, whose API is similarly repetitive (see Data.List, Data.ByteString.Lazy, etc...), as well as Map/IntMap/SetIntSet etc. I greatly appreciate the effort that went into these libraries, and admire the elegance of the implementation as well as the stunning results wrt. efficiency gains etc.. However I fear that duplicating interfaces in this way will prove to be problematic in the long run. The problems I (for-)see are for maintenance and usability, both of which are of course two sides of the same coin. For the library implementer, maintenance will become more difficult, as ever more of such 'almost equal' interfaces must be maintained and kept in sync. One could use code generation or macro expansion to alleviate this, but IMO the necessity to use extra-language pre-processors points to a weakness in the language; it be much less complicated and more satisfying to use a language feature that avoids the repetition instead of generating code to facilitate it. On the other side of teh coin, usability suffers as one has to lookup the (almost) same function in more and more different (but 'almost equal') module interfaces, depending on whether the string in question is Char vs. Byte, strict vs. lazy, packed vs. unpacked, encoded in X or Y or Z..., especially since there is no guarantee that the function is /really/ spelled the same everywhere and also really does what the user expects.(*) I am certain that most, if not all, people involved with these new libraries are well aware of these infelicities. Of course, type classes come to mind as a possible solution. However, something seems to prevent developers from using them to capture e.g. a common String or ListLike interface. Whatever this 'something' is, I think it should be discussed and addressed, before the number of 'almost equal' APIs becomes unmanageable for users and maintainers. Here are some raw ideas: One reason why I think type classes have not (yet) been used to reduce the amount of API repetition is that Haskell doesn't (directly) support abstraction over type constraints nor over the number of type parameters (polykinded types?). Often such 'almost equal' module APIs differ in exactly these aspects, i.e. one has an additional type parameter, while yet another one needs slightly different or additional constraints on certain types. Oleg K. has shown that some if these limitations can be overcome w/o changing or adding features to the language, however these tricks are not easy to learn and apply. Another problem is the engineering question of how much to put into the class proper: there is a tension between keeping the class as simple as possible (few methods, many parametric functions) for maximum usability vs. making it large (many methods, less parametric functions) for maximum efficiency via specialized implementations. It is often hard to decide this question up front, i.e. before enough instances are available. (This has been stated as a cause for defering the decision for a common interface to list-like values or strings). Since the type of a function doesn't reveal whether it is a normal function with a class constraint or a real class method, I imagine a language feature that (somehow) enables me to specialize such a function for a particular instance even if it is not a proper class member. Or maybe we have come to the point where Haskell's lack of a 'real' module system, like e.g. in SML, actually starts to hurt? Can associated types come to the rescue? Cheers Ben -- (*) I know that strictly speaking a class doesn't guarantee any semantic conformance either, but at least there is a common place to document the expected laws that all implementations should obey. With duplicated module APIs there is no such single place.