Thanks for your comments, Ian. I appreciate your time and care in looking this over!
Incidentally, I've just noticed some broken haddock markup for:
I/O libraries /do not support locale-sensitive I\O
in
http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-IO.html
Thanks for spotting that. It appears to be due to a Haddock bug, unfortunately.
> a much larger variety of encoding functions
Why not bundle these in the text package, or also put this package in
the platform?
Either one would induce a dependency on text-icu, which is not as mature as text, and which would imply a dependency on the rather large ICU library. I do believe that text-icu should be submitted, but not until it's ready.
On
http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
a number of haddocks say
Subject to fusion.
but I can't see an explanation for the new user of what this means or
why they should care.
However, that description is skimpy, and I've replaced it:
-- Most of the functions in this module are subject to /fusion/,
-- meaning that a pipeline of such functions will usually allocate at
-- most one 'Text' value.
--
-- As an example, consider the following pipeline:
--
-- > import Data.Text as T
-- > import Data.Text.Encoding as E
-- >
-- > countChars :: ByteString -> Int
-- > countChars = T.length . T.toUpper . E.decodeUtf8
--
-- From the type signatures involved, this looks like it should
-- allocate one 'ByteString' value, and two 'Text' values. However,
-- when a module is compiled with optimisation enabled under GHC, the
-- two intermediate 'Text' values will be optimised away, and the
-- function will be compiled down to a single loop over the source
-- 'ByteString'.
--
-- Functions that can be fused by the compiler are marked with the
-- phrase \"Subject to fusion\".
In
http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-Encoding-Error.html
I would expect lenientDecode etc to use the On{En,De}codeError type
synonyms defined above.
Good point. I've fixed that up.
In
http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-Lazy.html
the choice 'B' seems odd:
import qualified Data.Text.Lazy as B
Yep. Fixed :-)
I would have expected
http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
to mention the existence of .Lazy in its description, and an explanation
of when I should use it.
I've expanded that discussion.
Are there cases when Data.Text is significantly faster than
Data.Text.Lazy?
It's often about twice as fast, but that depends on the nature of the code and data involved.
In
http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
isInfixOf's docs day:
O(n+m) The isInfixOf function takes two Texts and returns True iff the
first is contained, wholly and intact, anywhere within the second.
In (unlikely) bad cases, this function's time complexity degrades
towards O(n*m).
I think the complexity at the start, in the same place as all the other
complexities, ought to be O(n*m), with the common case given afterwards.
I'd prefer to keep this as is.
And replace's docs just say
O(m+n) Replace every occurrence of one substring with another.
but should presumably be O(n*m). It's also not necessarily clear what m
and n refer to.
The two parameters to the function?
> unicode-unaware case conversion (map toUpper is an unsafe case
> conversion)
Surely this is something that should be added to Data.Char, irrespective
of whether text is added to the HP?
Yes, but that's a not-this-problem problem.
> A large testsuite, with coverage data, is provided.
It would be nice if this was on the text package's page, rather than in
~dons.
I don't know how to do that.
> RecordWildCards
I'm not a fan, but I fear I may be in the minority.
It's just used internally, so why do you mind?
There are a number of other differences which probably want to be tidied
up (mostly functions which are in one package but not the other, and
ByteString has IO functions mixed in with the non-IO functions), but
those seemed to be the most significant ones. Also,
prefixed :: Text -> Text -> Maybe Text
is analogous to
stripPrefix :: Eq a => [a] -> [a] -> Maybe [a]
in Data.List
I hadn't seen that. Hmm. For use with view patterns, I prefer the name I'm using right now.
This also made me notice that Text haddocks tend to use 'b' as a type
variable rather than 'a', e.g.
foldl :: (b -> Char -> b) -> b -> Text -> b
Historical artifact :-)