RE: [Haskell-cafe] Can we do better than duplicate APIs? [was: Data.CompactString 0.3]

26 Mar 2007

      [Probably libraries@haskell.org is the right list for this message, so I'm fwding your message below, and will reply there.]

| -----Original Message-----
| From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Benjamin
| Franksen
| Sent: 23 March 2007 22:56
| To: haskell-cafe@haskell.org
| Cc: haskell@haskell.org
| Subject: [Haskell-cafe] Can we do better than duplicate APIs? [was: Data.CompactString 0.3]
|
| [sorry for the somewhat longer rant, you may want to skip to the more
| technical questions at the end of the post]
|
| Twan van Laarhoven wrote:
| > I would like to announce version 0.3 of my Data.CompactString library.
| > Data.CompactString is a wrapper around Data.ByteString that represents a
| > Unicode string. This new version supports different encodings, as can be
| > seen from the data type:
| >
| > [...]
| >
| > Homepage:  http://twan.home.fmf.nl/compact-string/
| > Haddock:   http://twan.home.fmf.nl/compact-string/doc/html/
| > Source:    darcs get http://twan.home.fmf.nl/repos/compact-string
|
| After taking a look at the Haddock docs, I was impressed by the amount of
| repetition in the APIs. Not ony does Data.CompactString duplicate the whole
| Data.ByteString interface (~100 functions, adding some more for encoding
| and decoding), the whole interface is again repeated another four times,
| once for each supported encoding.
|
| Now, this is /not/ meant as a criticism of the compact-string package in
| particular. To the contrary, duplicating a fat interface for almost
| identical functionality is apparently state-of-the-art in Haskell library
| design, viz. the celebrated Data.Bytesting, whose API is similarly
| repetitive (see Data.List, Data.ByteString.Lazy, etc...), as well as
| Map/IntMap/SetIntSet etc. I greatly appreciate the effort that went into
| these libraries, and admire the elegance of the implementation as well as
| the stunning results wrt. efficiency gains etc.. However I fear that
| duplicating interfaces in this way will prove to be problematic in the long
| run.
|
| The problems I (for-)see are for maintenance and usability, both of which
| are of course two sides of the same coin. For the library implementer,
| maintenance will become more difficult, as ever more of such 'almost equal'
| interfaces must be maintained and kept in sync. One could use code
| generation or macro expansion to alleviate this, but IMO the necessity to
| use extra-language pre-processors points to a weakness in the language; it
| be much less complicated and more satisfying to use a language feature that
| avoids the repetition instead of generating code to facilitate it. On the
| other side of teh coin, usability suffers as one has to lookup the (almost)
| same function in more and more different (but 'almost equal') module
| interfaces, depending on whether the string in question is Char vs. Byte,
| strict vs. lazy, packed vs. unpacked, encoded in X or Y or Z..., especially
| since there is no guarantee that the function is /really/ spelled the same
| everywhere and also really does what the user expects.(*)
|
| I am certain that most, if not all, people involved with these new libraries
| are well aware of these infelicities. Of course, type classes come to mind
| as a possible solution. However, something seems to prevent developers from
| using them to capture e.g. a common String or ListLike interface. Whatever
| this 'something' is, I think it should be discussed and addressed, before
| the number of 'almost equal' APIs becomes unmanageable for users and
| maintainers.
|
| Here are some raw ideas:
|
| One reason why I think type classes have not (yet) been used to reduce the
| amount of API repetition is that Haskell doesn't (directly) support
| abstraction over type constraints nor over the number of type parameters
| (polykinded types?). Often such 'almost equal' module APIs differ in
| exactly these aspects, i.e. one has an additional type parameter, while yet
| another one needs slightly different or additional constraints on certain
| types. Oleg K. has shown that some if these limitations can be overcome w/o
| changing or adding features to the language, however these tricks are not
| easy to learn and apply.
|
| Another problem is the engineering question of how much to put into the
| class proper: there is a tension between keeping the class as simple as
| possible (few methods, many parametric functions) for maximum usability vs.
| making it large (many methods, less parametric functions) for maximum
| efficiency via specialized implementations. It is often hard to decide this
| question up front, i.e. before enough instances are available. (This has
| been stated as a cause for defering the decision for a common interface to
| list-like values or strings). Since the type of a function doesn't reveal
| whether it is a normal function with a class constraint or a real class
| method, I imagine a language feature that (somehow) enables me to
| specialize such a function for a particular instance even if it is not a
| proper class member.
|
| Or maybe we have come to the point where Haskell's lack of a 'real' module
| system, like e.g. in SML, actually starts to hurt? Can associated types
| come to the rescue?
|
| Cheers
| Ben
| --
| (*) I know that strictly speaking a class doesn't guarantee any semantic
| conformance either, but at least there is a common place to document the
| expected laws that all implementations should obey. With duplicated module
| APIs there is no such single place.
|
| _______________________________________________
| Haskell-Cafe mailing list
| Haskell-Cafe@haskell.org
| http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: [Haskell-cafe] Can we do better than duplicate APIs? [was: Data.CompactString 0.3]

Simon Peyton-Jones