
On Tue, Sep 07, 2010 at 08:26:36AM -0700, Donald Bruce Stewart wrote:
= Proposal: Add Data.Text to the Haskell Platform =
I feel silly saying this, but as this will probably serve as an example of the policy I'll say it anyway: I think this should be: Proposal: Add 'text' to the Haskell Platform
Proposal Author: Don Stewart Maintainer: Bryan O'Sullivan (submitted with his approval)
Credits Proposal author and package maintainer: Bryan O'Sullivan, originally by Tom Harper, based on ByteString? and Vector (fusion) packages.
The following individuals contributed to the review process: Don Stewart, Johan Tibell
These two sections appear to contradict each other.
Also, the hackage page says
Maintainer Bryan O'Sullivan
This is a proposal for the 'text' package
Should mention the version number, and link to the hackage page.
This package provides text processing capabilities that are optimized for performance critical use, both in terms of large data quantities and high speed.
Are there other uses it is less suitable for, or are you just saying that the code has been optimised? If performance is important for the proposal, do you have evidence that it performs well, or a way to check that performance has not regressed in future releases?
using several standard encodings
Just ASCII and UTF*, right? Incidentally, I've just noticed some broken haddock markup for: I/O libraries /do not support locale-sensitive I\O in http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-...
see the 'text-icu' package
Would be nice for this to link to the hackage page.
a much larger variety of encoding functions
Why not bundle these in the text package, or also put this package in the platform? hackage doesn't have the haddocks as I write this, but I assume they are text-specific.
Should link to the version-specific page. This item of "Proposal content" on AddingPackages doesn't seem to be covered: For library packages, an example of how the API is intended to be used should be given. This is really a comment on the process rather than your proposal, but After a proposal is accepted (or conditionally accepted) the proposal must remain on the wiki. and An explicit checklist of the package requirements below is not required. The proposal should state however that all the requirements are met seem incompatible to me, as your All package requirements are met. comment will become out of date as the requirement list evolves. On http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.... a number of haddocks say Subject to fusion. but I can't see an explanation for the new user of what this means or why they should care. Also, what it not be better to say Warning: Not subject to fusion. for the handful that aren't? Currently it's hard to notice. In http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-... I would expect lenientDecode etc to use the On{En,De}codeError type synonyms defined above. In http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-... the choice 'B' seems odd: import qualified Data.Text.Lazy as B I would have expected http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.... to mention the existence of .Lazy in its description, and an explanation of when I should use it. Are there cases when Data.Text is significantly faster than Data.Text.Lazy? Do we need both? (Presumably .Lazy is built on top of Data.Text, but do we need the user to have a complete interface for both?) In http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.... isInfixOf's docs day: O(n+m) The isInfixOf function takes two Texts and returns True iff the first is contained, wholly and intact, anywhere within the second. In (unlikely) bad cases, this function's time complexity degrades towards O(n*m). I think the complexity at the start, in the same place as all the other complexities, ought to be O(n*m), with the common case given afterwards. And replace's docs just say O(m+n) Replace every occurrence of one substring with another. but should presumably be O(n*m). It's also not necessarily clear what m and n refer to.
length :: Text -> Int O(n) Returns the number of characters in a Text. Subject to fusion.
Did you consider keeping the number of characters in the Text directly? Is there a reason it couldn't be done?
prevent is general use
"prevent its general use"
a number of way:
"a number of ways:"
unicode-unaware case conversion (map toUpper is an unsafe case conversion)
Surely this is something that should be added to Data.Char, irrespective of whether text is added to the HP?
the data structure is element-level lazy, whereas a number of applications require either some level of additional strictness
This sentence looks like it has been mis-edited? And by "a number of applications" I think you mean "high performance applications"?
support whole-string case conversion (thus, type correct unicode transformations)
I don't really get what you mean by "type correct" here.
based on unboxed Word16 arrays
Why Word16?
As of Q2 2010, 'text' is ranked 27/2200 libraries (top 1% most popular), in particular, in web programming.
I can't work out what you mean here. Ranked 27 by what metric? Why web programming in particular?
A large testsuite, with coverage data, is provided.
It would be nice if this was on the text package's page, rather than in ~dons.
RecordWildCards
I'm not a fan, but I fear I may be in the minority.
propposal
"proposal"
to expose only 5 modules
9, no?
The public modules expose none of these (?).
None of what? I compared the API of Data.Text and Data.ByteString.Char8 and found a number of differences: BS: break :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) breakEnd :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) breakSubstring :: ByteString -> ByteString -> (ByteString, ByteString) Text: break :: Text -> Text -> (Text, Text) breakEnd :: Text -> Text -> (Text, Text) breakBy :: (Char -> Bool) -> Text -> (Text, Text) BS: count :: Char -> ByteString -> Int Text: count :: Text -> Text -> Int BS: find :: (Char -> Bool) -> ByteString -> Maybe Char Text: find :: Text -> Text -> [(Text, Text)] findBy :: (Char -> Bool) -> Text -> Maybe Char BS: replicate :: Int -> Char -> ByteString Text: replicate :: Int -> Text -> Text BS: split :: Char -> ByteString -> [ByteString] Text: split :: Text -> Text -> [Text] BS: span :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) spanEnd :: (Char -> Bool) -> ByteString -> (ByteString, ByteString) Text: spanBy :: (Char -> Bool) -> Text -> (Text, Text) BS: splitBy :: (Char -> Bool) -> Text -> [Text] Text: splitWith :: (Char -> Bool) -> ByteString -> [ByteString] BS: unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> (ByteString, Maybe a) Text: unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> Text BS: zipWith :: (Char -> Char -> a) -> ByteString -> ByteString -> [a] Text: zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text I think the two APIs ought to be brought into agreement. There are a number of other differences which probably want to be tidied up (mostly functions which are in one package but not the other, and ByteString has IO functions mixed in with the non-IO functions), but those seemed to be the most significant ones. Also, prefixed :: Text -> Text -> Maybe Text is analogous to stripPrefix :: Eq a => [a] -> [a] -> Maybe [a] in Data.List This also made me notice that Text haddocks tend to use 'b' as a type variable rather than 'a', e.g. foldl :: (b -> Char -> b) -> b -> Text -> b Thanks Ian