Re: Haskell Platform Proposal: add the 'text' library

I'd like to first say that I'm very impressed with Ian's thoroughness of review. On the API differences between Data.Text and Data.ByteString.Char8, I agree with Duncan that the Data.Text API is more natural for text-oriented work, although I'm slightly uncomfortable with the similarities between Data.Text and Data.List. Everything works the same, until it doesn't because of a minor API change you didn't notice. Would it be useful to list the API incompatibilities in the docs, either as a list or at each relevant function? Or would that just be extra noise? John
I compared the API of Data.Text and Data.ByteString.Char8 and found a number of differences:
Many of these are deliberate and sensible. The thing with text as opposed to lists/arrays is that almost all operations you want to do are substring based and not element based. A Unicode code point (a Char) is sadly only roughly related to the human concept of a character. In particular there are combining characters. So even if you want to search or split on a particular "character" that may mean searching for a short sequence of Chars / code points.
So where the ByteString API followed the List api by being byte oriented, the Text API is substring oriented.
BS: Â break :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) Â Â Â breakEnd :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) Â Â Â breakSubstring :: ByteString -> ByteString -> (ByteString, ByteString) Text: break :: Text -> Text -> (Text, Text) Â Â Â breakEnd :: Text -> Text -> (Text, Text) Â Â Â breakBy :: (Char -> Bool) -> Text -> (Text, Text)
BS: Â count :: Char -> ByteString -> Int Text: count :: Text -> Text -> Int
BS: Â find :: (Char -> Bool) -> ByteString -> Maybe Char Text: find :: Text -> Text -> [(Text, Text)] Â Â Â findBy :: (Char -> Bool) -> Text -> Maybe Char
BS: Â replicate :: Int -> Char -> ByteString Text: replicate :: Int -> Text -> Text
BS: Â split :: Char -> ByteString -> [ByteString] Text: split :: Text -> Text -> [Text]
BS: Â span :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) Â Â Â spanEnd :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) Text: spanBy :: (Char -> Bool) -> Text -> (Text, Text)
BS: Â splitBy :: (Char -> Bool) -> Text -> [Text] Text: splitWith :: (Char -> Bool) -> ByteString -> [ByteString]
BS: Â unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> (ByteString, Maybe a) Text: unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> Text
BS: Â zipWith :: (Char -> Char -> a) -> ByteString -> ByteString -> [a] Text: zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text
I think the two APIs ought to be brought into agreement.
Perhaps. If so, then it is the ByteString.Char8 that ought to be brought into agreement with Text, not the other way around. I think Text is right in this area. On the other hand, perhaps it makes sense for ByteString.Char8 to remain like the ByteString byte interface which is byte oriented (and probably rightly so). I hope the significance and use of ByteString.Char8 will decrease as Text becomes more popular. ByteString.Char8 is really just for the cases where you're handling ASCII-like protocols.
participants (1)
-
John Lato