Re: Haskell Platform Proposal: add the 'text' library

8 Sep 2010

      On Tue, Sep 07, 2010 at 11:21:19PM +0100, Duncan Coutts wrote:
...
On 7 September 2010 22:50, Ian Lynagh  wrote:
...
I compared the API of Data.Text and Data.ByteString.Char8 and found a
number of differences:
Many of these are deliberate and sensible.
Some at least seem just gratuitously different, e.g.:

BS:   break :: (Char -> Bool) -> ByteString -> (ByteString, ByteString)
      breakSubstring :: ByteString -> ByteString -> (ByteString, ByteString)
Text: break :: Text -> Text -> (Text, Text)
      breakBy :: (Char -> Bool) -> Text -> (Text, Text)
...
The thing with text as
opposed to lists/arrays is that almost all operations you want to do
are substring based and not element based. A Unicode code point (a
Char) is sadly only roughly related to the human concept of a
character. In particular there are combining characters. So even if
you want to search or split on a particular "character" that may mean
searching for a short sequence of Chars / code points.
Hmm, wouldn't you want to be able to break on
    either
        <a-with-umlaut>
    or
        <a> <umlaut combining character>
in that case?

Also, even if the intention is that you
    break [<a>, <umlaut combining character>]
people will still use it for other things, e.g.
    break "END FOO"
and wonder why they are not able to do likewise with bytestring.

Even if there is a case where you would want different behaviour in the
two packages, I think it would be bettre if the function names weren't
the same.
...
...
I think the two APIs ought to be brought into agreement.
Perhaps. If so, then it is the ByteString.Char8 that ought to be
brought into agreement with Text, not the other way around.
I don't have an opinion on what the APIs should look like; I'd just like
them to be consistent.
...
...
There are a number of other differences which probably want to be tidied
up (mostly functions which are in one package but not the other,
What are you thinking of specifically?
There are a number of them:

In Text only:
    center, chunksOf, dropAround, dropWhileEnd, justifyLeft,
    justifyRight, partitionBy, prefixed, replace, strip, stripEnd,
    stripStart, suffixed, compareLength, toCaseFold, toLower, toUpper

In BS only:
    copy, elem, elemIndex, elemIndexEnd, elemIndices, findIndices,
    findSubstring, findSubstrings, foldr', foldr1', notElem, readInt,
    readInteger, sort, unzip
...
...
ByteString has IO functions mixed in with the non-IO functions,
Which I don't think was a good idea. I would prefer to split them up.
Agreed, but I would like us to move towards consistency.

Thanks
Ian