Re: Haskell Platform Proposal: add the 'text' library

8 Sep 2010

      I'd like to first say that I'm very impressed with Ian's thoroughness of
review.

On the API differences between Data.Text and Data.ByteString.Char8, I agree
with Duncan that the Data.Text API is more natural for text-oriented work,
although I'm slightly uncomfortable with the similarities between Data.Text
and Data.List.  Everything works the same, until it doesn't because of a
minor API change you didn't notice.

Would it be useful to list the API incompatibilities in the docs, either as
a list or at each relevant function?  Or would that just be extra noise?

John
...
...
I compared the API of Data.Text and Data.ByteString.Char8 and found a
number of differences:
Many of these are deliberate and sensible. The thing with text as
opposed to lists/arrays is that almost all operations you want to do
are substring based and not element based. A Unicode code point (a
Char) is sadly only roughly related to the human concept of a
character. In particular there are combining characters. So even if
you want to search or split on a particular "character" that may mean
searching for a short sequence of Chars / code points.
So where the ByteString API followed the List api by being byte
oriented, the Text API is substring oriented.
...
BS: Â  break :: (Char -> Bool) -> ByteString -> (ByteString, ByteString)
Â  Â  Â breakEnd :: (Char -> Bool) -> ByteString -> (ByteString,
ByteString)
Â  Â  Â breakSubstring :: ByteString -> ByteString -> (ByteString,
ByteString)
Text: break :: Text -> Text -> (Text, Text)
Â  Â  Â breakEnd :: Text -> Text -> (Text, Text)
Â  Â  Â breakBy :: (Char -> Bool) -> Text -> (Text, Text)
BS: Â  count :: Char -> ByteString -> Int
Text: count :: Text -> Text -> Int
BS: Â  find :: (Char -> Bool) -> ByteString -> Maybe Char
Text: find :: Text -> Text -> [(Text, Text)]
Â  Â  Â findBy :: (Char -> Bool) -> Text -> Maybe Char
BS: Â  replicate :: Int -> Char -> ByteString
Text: replicate :: Int -> Text -> Text
BS: Â  split :: Char -> ByteString -> [ByteString]
Text: split :: Text -> Text -> [Text]
BS: Â  span :: (Char -> Bool) -> ByteString -> (ByteString, ByteString)
Â  Â  Â spanEnd :: (Char -> Bool) -> ByteString -> (ByteString,
ByteString)
Text: spanBy :: (Char -> Bool) -> Text -> (Text, Text)
BS: Â  splitBy :: (Char -> Bool) -> Text -> [Text]
Text: splitWith :: (Char -> Bool) -> ByteString -> [ByteString]
BS: Â  unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> (ByteString,
Maybe a)
Text: unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> Text
BS: Â  zipWith :: (Char -> Char -> a) -> ByteString -> ByteString -> [a]
Text: zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text
I think the two APIs ought to be brought into agreement.
Perhaps. If so, then it is the ByteString.Char8 that ought to be
brought into agreement with Text, not the other way around. I think
Text is right in this area. On the other hand, perhaps it makes sense
for ByteString.Char8 to remain like the ByteString byte interface
which is byte oriented (and probably rightly so). I hope the
significance and use of ByteString.Char8 will decrease as Text becomes
more popular. ByteString.Char8 is really just for the cases where
you're handling ASCII-like protocols.

John Lato

tags

participants (1)