
On Tue, Sep 07, 2010 at 11:21:19PM +0100, Duncan Coutts wrote:
On 7 September 2010 22:50, Ian Lynagh
wrote: I compared the API of Data.Text and Data.ByteString.Char8 and found a number of differences:
Many of these are deliberate and sensible.
Some at least seem just gratuitously different, e.g.: BS: break :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) breakSubstring :: ByteString -> ByteString -> (ByteString, ByteString) Text: break :: Text -> Text -> (Text, Text) breakBy :: (Char -> Bool) -> Text -> (Text, Text)
The thing with text as opposed to lists/arrays is that almost all operations you want to do are substring based and not element based. A Unicode code point (a Char) is sadly only roughly related to the human concept of a character. In particular there are combining characters. So even if you want to search or split on a particular "character" that may mean searching for a short sequence of Chars / code points.
Hmm, wouldn't you want to be able to break on either <a-with-umlaut> or <a> <umlaut combining character> in that case? Also, even if the intention is that you break [<a>, <umlaut combining character>] people will still use it for other things, e.g. break "END FOO" and wonder why they are not able to do likewise with bytestring. Even if there is a case where you would want different behaviour in the two packages, I think it would be bettre if the function names weren't the same.
I think the two APIs ought to be brought into agreement.
Perhaps. If so, then it is the ByteString.Char8 that ought to be brought into agreement with Text, not the other way around.
I don't have an opinion on what the APIs should look like; I'd just like them to be consistent.
There are a number of other differences which probably want to be tidied up (mostly functions which are in one package but not the other,
What are you thinking of specifically?
There are a number of them: In Text only: center, chunksOf, dropAround, dropWhileEnd, justifyLeft, justifyRight, partitionBy, prefixed, replace, strip, stripEnd, stripStart, suffixed, compareLength, toCaseFold, toLower, toUpper In BS only: copy, elem, elemIndex, elemIndexEnd, elemIndices, findIndices, findSubstring, findSubstrings, foldr', foldr1', notElem, readInt, readInteger, sort, unzip
ByteString has IO functions mixed in with the non-IO functions,
Which I don't think was a good idea. I would prefer to split them up.
Agreed, but I would like us to move towards consistency. Thanks Ian