Summary and call for discussion on text proposal

Thomas and I would like to summarise the current point of contention in the text library proposal with the aim of resolving the issue and getting the package accepted. It seems clear that we all want the package accepted, the disagreement is over details of the API. The problem here is not the amount of work to make the changes some people have been suggesting, the problem is disagreement over whether change is necessary and if so what change. There is essentially just one point of contention, over about 10 out of the 80+ functions in the Data.Text module. The issue is about which functions should get the nice names and about consistency between modules. (There is one other minor issue that Ross raised but we will deal with the most substantive issue first) There are two axes in which Text functions are generalised: * character predicate (e.g. searching for first char matching a predicate) * substring (e.g. searching for a substring) These are orthogonal directions of generalisation. There is no simple way to encompass both (regular expressions are not simple, naive generalisations cannot be implemented efficiently). The fact that there are these two forms of most functions is different from the List library which only has the element predicate direction, not the sub-sequence direction. This is the prelude to the problem, because the List library has already taken the common names for the character predicate versions. The design of the Text library encourages the use of substring operations because these are expected to be more commonly used and because correct handling of Unicode often requires substring operations (due to issues with combining characters). There are a number of options. To illustrate them let us pick an example function that breaks a text into two. There are two versions: * break based on a character predicate * break based on a substring Option 1 (current Text lib design) ---------------------------------- break :: Text -> Text -> (Text, Text) breakBy :: (Char -> Bool) -> Text -> (Text, Text) This gives the short name 'break' to the substring version, and the longer name 'breakBy' to the character predicate version. The argument for doing this is that the substring version should be the common encouraged one and so it should get the nice name. The argument against is that this is inconsistent with the List library which gives the name 'break' to the element predicate version: break :: (a -> Bool) -> [a] -> ([a], [a]) Option 2 -------- breakSubstring :: Text -> Text -> (Text, Text) break :: (Char -> Bool) -> Text -> (Text, Text) This gives the short name 'break' to the character predicate version and the longer 'breakSubstring' to the substring version. The argument for doing this is that it is consistent with the List library in its use of the name 'break'. The argument against is that the short name is now given to the version that is discouraged, and the version that is encouraged now has a very long and ugly name: this API is encouraging users to make the wrong choices. Decisions --------- There appears to be no consensus over which of these two options to pick. If this situation persists then the default position is for the package not to be accepted at all. We think there is consensus that the package should go into the platform in some form -- that the worst of all the options is for the package not to go in a all. We are now at the third stage of the consensus protocol. At this stage discussion should be limited to resolving one concern at a time. Anyone may contribute to this discussion. The people required to take part are the proposal author (Don) and anyone who has concerns with the package going in as is. People with concerns should restate those concerns and if necessary questions should be asked to clarify the concerns. In particular, if the summary above is not an accurate expression of peoples concerns then they should say so. Don will update the wiki proposal page with the details of the remaining concerns (or simply the summary above if this is accurate). The steering committee (in this case Thomas and I) will follow the discussion. If we are still stuck in one week (14th Nov) then the steering committee will re-evaluate the situation. To kick off the discussion focused on this narrow issue, Thomas and I would like to suggest a 3rd alternative option: Option 3 -------- breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text) This give neither version the short name 'break', but gives both reasonably short names with a suffix to indicate the character predicate vs substring. This addresses the complaint that a name from the List library is being used but with an inconsistent type (because the name is not being used at all). It removes the problem that the character predicate versions are being promoted over the substring versions by the use of the shorter names. It makes explicit the fact that all the functions come in two forms, whereas with List there is just one form. There is still a strong connection with the List library by using the same root names. So people can still carry over their experience of the List library API to help them find the right Text functions. They will have to make one choice between the character predicate and substring versions, which is reasonable given that the substring versions are preferred. Duncan & Thomas (with their platform steering committee hats on)

On November 7, 2010 09:36:35 Duncan Coutts wrote:
To kick off the discussion focused on this narrow issue, Thomas and I would like to suggest a 3rd alternative option:
Option 3 --------
breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text)
This give neither version the short name 'break', but gives both reasonably short names with a suffix to indicate the character predicate vs substring.
I think this is a good solution. With this option you can also include sub modules so users could choose between 1 and 2. That is, something like Data.Text -- doesn't preference (i.e., only Str and Chr suffixed ones) Data.Text.String -- reexports the *Str ones without the Str suffixes Data.Text.Char -- reexports the *Chr ones without the Chr suffixes Cheers! -Tyson

On Sun, Nov 07, 2010 at 10:07:32AM -0500, Tyson Whitehead wrote:
Data.Text.String -- reexports the *Str ones without the Str suffixes Data.Text.Char -- reexports the *Chr ones without the Chr suffixes
I think this would be a bad idea. In order to understand some code you'd have to check what imports are in scope. Thanks Ian

On Sun, Nov 07, 2010 at 02:36:35PM +0000, Duncan Coutts wrote:
It seems clear that we all want the package accepted, the disagreement is over details of the API. The problem here is not the amount of work to make the changes some people have been suggesting, the problem is disagreement over whether change is necessary and if so what change.
Right.
There are two axes in which Text functions are generalised: * character predicate (e.g. searching for first char matching a predicate) * substring (e.g. searching for a substring)
Not necessarily character /predicate/, e.g.: count :: Char -> Text -> Int vs count :: Text -> Text -> Int
The fact that there are these two forms of most functions is different from the List library which only has the element predicate direction, not the sub-sequence direction.
This is true at the moment, but there is no reason one couldn't want or have the sub-sequence versions for lists. I think I've occasionally wanted this, but I can't recall concrete examples OTTOMH.
The design of the Text library encourages the use of substring operations because these are expected to be more commonly used and because correct handling of Unicode often requires substring operations (due to issues with combining characters).
Can you give an example of such an operation please, which doesn't go wrong when the argument is "c", the input contains "cx" and 'x' is a combining character such that there is no composed codepoint for "cx"?
People with concerns should restate those concerns
I think you've covered my concerns (consistency with list/bytestring).
and if necessary questions should be asked to clarify the concerns.
Asked above :-)
Option 3 --------
breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text)
This give neither version the short name 'break', but gives both reasonably short names with a suffix to indicate the character predicate vs substring.
I think this is better than option 1 (break doesn't do something unexpected), but worse than option 2 (break doesn't do what is expected, i.e. you need to actually go and look for the name of the function you want) (from the context of someone familiar with the list and bytestring APIs). I think it's closer to option 1 than option 2, though. Thanks Ian

On Sun, Nov 7, 2010 at 7:16 AM, Ian Lynagh
Can you give an example of such an operation please, which doesn't go wrong when the argument is "c", the input contains "cx" and 'x' is a combining character such that there is no composed codepoint for "cx"?
I don't think there's been any contention that searching purely on Text values is enough to handle that. However, a Char→Bool predicate clearly isn't enough either :-)

On Sun, Nov 07, 2010 at 08:19:00AM -0800, Bryan O'Sullivan wrote:
On Sun, Nov 7, 2010 at 7:16 AM, Ian Lynagh
wrote: Can you give an example of such an operation please, which doesn't go wrong when the argument is "c", the input contains "cx" and 'x' is a combining character such that there is no composed codepoint for "cx"?
I don't think there's been any contention that searching purely on Text values is enough to handle that. However, a Char→Bool predicate clearly isn't enough either :-)
Quite so, but it doesn't claim to be able to do so either. Maybe I'm misunderstanding the issue, so my question was too specific. AIUI the motivation for the current text API is: The design of the Text library encourages the use of substring operations because these are expected to be more commonly used and because correct handling of Unicode often requires substring operations (due to issues with combining characters). so can someone please give an example of a function that correctly handles Unicode by using a substring operation? Thanks Ian

On Sun, Nov 7, 2010 at 12:49 PM, Ian Lynagh
Maybe I'm misunderstanding the issue, so my question was too specific. AIUI the motivation for the current text API is:
The design of the Text library encourages the use of substring operations because these are expected to be more commonly used and because correct handling of Unicode often requires substring operations (due to issues with combining characters).
Actually, my motivation in using the types I did was due to ease and frequency of use and ease of providing good performance. I didn't look to existing Haskell libraries for the ease-of-use perspective, but instead to Python and Perl. Whether or not this helped or hindered handling of Unicode wasn't a factor; it helped *programmibility*.

Option 1 (current Text lib design) ----------------------------------
break :: Text -> Text -> (Text, Text) breakBy :: (Char -> Bool) -> Text -> (Text, Text)
This gives the short name 'break' to the substring version, and the longer name 'breakBy' to the character predicate version.
I very much dislike this option, for the reasons already rehearsed. And Ian presented some evidence that there are no (or very few) usages of the short name versions in client libraries on hackage. But this raw data does not give us enough context to know how to interpret it. Are there simply very few clients of the Text package at all so far? Are there many clients but they use other parts of the API? How many usages of the long name versions appear? And that does not address the idea that perhaps the clients indeed ought to be using the substring versions, but for whatever reason have not done so.
Option 2 --------
breakSubstring :: Text -> Text -> (Text, Text) break :: (Char -> Bool) -> Text -> (Text, Text)
This gives the short name 'break' to the character predicate version and the longer 'breakSubstring' to the substring version.
This option would be preferable to option 1, although I understand it is not likely to win the author/maintainer's vote. Of course, 'breakSubstring' is a slightly extreme version of the name, to make a point. It could just as easily be 'breakStr'.
Option 3 --------
breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text)
This give neither version the short name 'break', but gives both reasonably short names with a suffix to indicate the character predicate vs substring.
As a compromise between options 1 & 2, this option has merit. It leaves open the possibility that the signatures of the short names might yet be decided at a later date. If Bryan were willing to go with this option, I would certainly support it. Regards, Malcolm

On Sun, Nov 7, 2010 at 11:56 AM, Malcolm Wallace
Option 3
--------
breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text)
This give neither version the short name 'break', but gives both reasonably short names with a suffix to indicate the character predicate vs substring.
As a compromise between options 1 & 2, this option has merit. It leaves open the possibility that the signatures of the short names might yet be decided at a later date. If Bryan were willing to go with this option, I would certainly support it.
+1. I too think Option 3 has merit, if only because it resolves the current
logjam, and still leaves open the possibility for consensus to be reached on the short names at some point in the future without either side feeling disadvantaged -- but do we really really have to randomly abbreviate Char and String? -Edward

On 11/7/10 12:51 PM, Edward Kmett wrote:
On Sun, Nov 7, 2010 at 11:56 AM, Malcolm Wallace
wrote: Option 3
--------
breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text)
This give neither version the short name 'break', but gives both reasonably short names with a suffix to indicate the character predicate vs substring.
As a compromise between options 1& 2, this option has merit. It leaves open the possibility that the signatures of the short names might yet be decided at a later date. If Bryan were willing to go with this option, I would certainly support it.
+1. I too think Option 3 has merit, if only because it resolves the current logjam, and still leaves open the possibility for consensus to be reached on the short names at some point in the future without either side feeling disadvantaged -- but do we really really have to randomly abbreviate Char and String?
+1 to resolving the logjam if the author is willing. But also -1 for the random abbreviation. At the very least *Chr should be *Char. Making an abbreviation for a single character is unnecessary, unhelpful, and confusing. For *Str, at least the abbreviation has a meaningful effect in shortening things, but given that we're talking about Text and not String, why not go for *Text which is short, unabbreviated, and matches the type in question. -- Live well, ~wren

On Nov 7, 2010, at 6:16 PM, wren ng thornton wrote:
On 11/7/10 12:51 PM, Edward Kmett wrote:
On Sun, Nov 7, 2010 at 11:56 AM, Malcolm Wallace
wrote:
Option 3
--------
breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text)
But also -1 for the random abbreviation. At the very least *Chr should be *Char. Making an abbreviation for a single character is unnecessary, unhelpful, and confusing. For *Str, at least the abbreviation has a meaningful effect in shortening things, but given that we're talking about Text and not String, why not go for *Text which is short, unabbreviated, and matches the type in question.
I would vote for breakText over breakStr because this is a useful variant:
breakStr :: String -> Text -> (Text, Text)
I generally do not use OverloadedStrings and might define that locally to avoid having to call Text.pack a lot. Also, if breakText has the type
breakText :: Text -> Text -> (Text, Text)
Then I would probably expect breakChar to have the type:
breakChar :: Char -> Text -> (Text, Text)
So I don't think that naming is really all that consistent. If Data.List was not in the picture, option 1 certainly seems quite sensible. With Data.List in the picture, option 1 is not going to result in bugs in your code, because the type checker will figure out that you are doing it wrong. So, the only downside of option 1, IMO, is that you have to remember that Data.Text has a different meaning for break than Data.List. Or put differently, you are going to have to look at the documentation to figure out what you want. But with option 3, you have to look at the documentation as well, because the names it uses do not come from Data.List either. So the gain is not really that significant. So I vote +0 on option 3. But if it is option 3, I think I would rather see:
breakText :: Text -> Text -> (Text, Text) breakBy :: (Char -> Bool) -> Text -> (Text, Text)
I would also like to vote -100 for: Option 4 ------------ class Break a where break :: a -> Text -> (Text, Text) instance Break Text where break = breakStr instance Break (Char -> Bool) where break = breakChar Though it does have its charm.. - jeremy

On Nov 7, 2010, at 12:51 PM, Edward Kmett wrote:
On Sun, Nov 7, 2010 at 11:56 AM, Malcolm Wallace
wrote: Option 3 -------- breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text)
This give neither version the short name 'break', but gives both reasonably short names with a suffix to indicate the character predicate vs substring.
As a compromise between options 1 & 2, this option has merit. It leaves open the possibility that the signatures of the short names might yet be decided at a later date. If Bryan were willing to go with this option, I would certainly support it.
+1. I too think Option 3 has merit, if only because it resolves the current logjam, and still leaves open the possibility for consensus to be reached on the short names at some point in the future without either side feeling disadvantaged -- but do we really really have to randomly abbreviate Char and String?
"A good compromise is when both parties are dissatisfied, and I think that's what we have here." -- Larry David. +1. If Bryan finds this an acceptable compromise, then we should proceed. Preferably breakChar and breakText, but anything along those lines is good for me. Future usage can then teach us what we can only speculate over now -- which functions do indeed turn out to be more common, useful, and necessary, and the degree to which relationship to the list API does or does not provide a point of utility or confusion. I also want to make the case that this hasn't been a pointless bikeshedding discussion, although it has been slow. There's a valid case for uniformity, and a valid case for package-specific APIs. Uniformity is a good thing to strive for across the Haskell Platform, and is a key part of providing a set of basic libraries. We've done a better job with uniformity thus far than e.g., the OCaml community, and thus, unlike OCaml Batteries (http://batteries.forge.ocamlcore.org/doc.preview:batteries-alpha3/html/about...), we don't need a uniformization layer. But that's because there's a culture of very careful attention to detail, with significant respect for history and convention. The folks that have weighed in have long lists of serious credentials, and I tend to feel that their input, whether or not I agree on any specific point, should be treated as worth its weight in gold. Frankly, if one of my libraries was up for consideration, I would give their concerns serious weight on their history and contributions alone, even if at first they struck me as written by martians. So concerns have been raised. We've dallied with them perhaps too long and the process stalled out. But now the committee has stepped in, and in the process we're trying to iron out the consensus process. Maybe we need a point at which people who don't object step in, or some process for "voting" on agreement with objections that the package author is not amenable to. But a serious and arduous review, which has found undeniable problems as well (e.g. corner case problems with large operations) is far from a "trackless mire." If Bryan doesn't agree with the proposal and wants to keep the Text API fundamentally as is, I vote for inclusion in the platform nonetheless. But, for what it's worth, my (significant) respect for his taste and ability to play well with others will be somewhat diminished. Cheers, Sterl.

On Sun, Nov 07, 2010 at 02:36:35PM +0000, Duncan Coutts wrote:
There are a number of options. To illustrate them let us pick an example function that breaks a text into two. There are two versions: * break based on a character predicate * break based on a substring
In fact break is the only example of two such versions conflicting with Data.List. For partitionBy and spanBy there are no substring variants, and find is not related to findBy; it is actually repeated application of break. So a possibility is - rename break (and ensure that breakEnd matches) - rename find as the plural of whatever break is renamed as - rename breakBy -> break, findBy -> find, partitionBy -> partition and spanBy -> span

On Sun, Nov 7, 2010 at 3:36 PM, Duncan Coutts
Thomas and I would like to summarise the current point of contention in the text library proposal with the aim of resolving the issue and getting the package accepted.
It seems clear that we all want the package accepted, the disagreement is over details of the API. The problem here is not the amount of work to make the changes some people have been suggesting, the problem is disagreement over whether change is necessary and if so what change.
By the way,
Myself and several other people have been following this discussion
with increasing levels of annoyance and frustration. My understanding
was that the HP process was intended to help with the overall design
of libraries and to head off serious problems before too much time is
wasted on discussion, NOT to devolve into extended megathreads over
which colour to paint the bike shed.
Another point I would like to make is that unless I'm mistaken, even
if text is accepted into the platform, that doesn't mean that
maintainership of the library is assigned to libraries@haskell.org: it
stays with Bryan. Given that he's repeatedly stated that the API is
the way that it is because that's the way he *wants* it to be, and he
has a plausible rationale for this, this entire discussion is MOOT and
we should immediately stop wasting time and move to a vote on
accepting text as-is.
If the fact that the names of a couple of functions aren't absolutely
consistent with their analogues in Data.List is enough to cause you to
vote no, then so be it --- but given how far above the bar text is on
a quality basis compared to some of the libraries we grandfathered in,
IMHO that would be an indication that something about this process is
completely broken and it should be abandoned forthwith.
Cheers,
G
--
Gregory Collins

On Sun, Nov 7, 2010 at 7:54 PM, Gregory Collins
On Sun, Nov 7, 2010 at 3:36 PM, Duncan Coutts
wrote: Thomas and I would like to summarise the current point of contention in the text library proposal with the aim of resolving the issue and getting the package accepted.
It seems clear that we all want the package accepted, the disagreement is over details of the API. The problem here is not the amount of work to make the changes some people have been suggesting, the problem is disagreement over whether change is necessary and if so what change.
By the way,
Myself and several other people have been following this discussion with increasing levels of annoyance and frustration. My understanding was that the HP process was intended to help with the overall design of libraries and to head off serious problems before too much time is wasted on discussion, NOT to devolve into extended megathreads over which colour to paint the bike shed.
Another point I would like to make is that unless I'm mistaken, even if text is accepted into the platform, that doesn't mean that maintainership of the library is assigned to libraries@haskell.org: it stays with Bryan. Given that he's repeatedly stated that the API is the way that it is because that's the way he *wants* it to be, and he has a plausible rationale for this, this entire discussion is MOOT and we should immediately stop wasting time and move to a vote on accepting text as-is.
If the fact that the names of a couple of functions aren't absolutely consistent with their analogues in Data.List is enough to cause you to vote no, then so be it --- but given how far above the bar text is on a quality basis compared to some of the libraries we grandfathered in, IMHO that would be an indication that something about this process is completely broken and it should be abandoned forthwith.
+1. I think this process is only scaring people from writing quality libraries, lest they be subjected to this endless bikeshedding. Bryan has addressed all of the substantive issues that I'm aware of that have ever been brought up about text. Let's just accept that the library is acceptable- and quite extraordinary- as is. And sorry to hijack this thread, but I wanted to bring up what a slightly more important (IMHO) topic: should the Haskell Platform be endorsing a single, canonical approach to solving problems? Right now, both text and utf8-string support UTF8 encoding/decoding. Would it be appropriate to try and deprecate usage of the latter in favor of using the more well-maintained, more performant text package? Michael

On 07/11/2010 20:05, Michael Snoyman wrote:
On Sun, Nov 7, 2010 at 7:54 PM, Gregory Collins
wrote: By the way,
Myself and several other people have been following this discussion with increasing levels of annoyance and frustration. My understanding was that the HP process was intended to help with the overall design of libraries and to head off serious problems before too much time is wasted on discussion, NOT to devolve into extended megathreads over which colour to paint the bike shed.
+1. I think this process is only scaring people from writing quality libraries, lest they be subjected to this endless bikeshedding. Bryan has addressed all of the substantive issues that I'm aware of that have ever been brought up about text. Let's just accept that the library is acceptable- and quite extraordinary- as is.
Another point: Is anyone responsible for bringing more libraries and tools into HP? The current setup sometimes looks like a lynch-mob of naysayers ganging up on any suggestion as audacious as adding a new package. A much more proactive attitude is needed if HP is going to grow.

On Sun, Nov 07, 2010 at 08:58:24PM +0200, John Smith wrote:
Another point: Is anyone responsible for bringing more libraries and tools into HP?
I'm not sure I understand the question. Anyone can propose a package for addition, using the process described here: http://trac.haskell.org/haskell-platform/wiki/AddingPackages Thanks Ian

On 07/11/2010 23:58, Ian Lynagh wrote:
On Sun, Nov 07, 2010 at 08:58:24PM +0200, John Smith wrote:
Another point: Is anyone responsible for bringing more libraries and tools into HP?
I'm not sure I understand the question. Anyone can propose a package for addition, using the process described here: http://trac.haskell.org/haskell-platform/wiki/AddingPackages
*That* is the problem. Perhaps the intention was that the community would volunteer to do this, but it doen't seem to be working. There are plenty of volunteer QA gatekeepers, but little effort to introduce new packages. If I understand correctly, a goal of HP is to incorporate new (useful and high-quality) libraries. This won't happen unless someone makes it their business to ensure that this happens.

On Sun, Nov 07, 2010 at 08:05:22PM +0200, Michael Snoyman wrote:
has addressed all of the substantive issues that I'm aware of that have ever been brought up about text.
Can you please give a few examples of substantive issues that have been brought up, so I can understand what sort of thing you mean? Thanks Ian

On Sun, Nov 7, 2010 at 11:50 PM, Ian Lynagh
On Sun, Nov 07, 2010 at 08:05:22PM +0200, Michael Snoyman wrote:
has addressed all of the substantive issues that I'm aware of that have ever been brought up about text.
Can you please give a few examples of substantive issues that have been brought up, so I can understand what sort of thing you mean?
I'm not talking about in this discussion. In the past (going back about a year now I think) I've reported a few bugs and performance issues to Bryan, and they've all been dealt with. As far as I can tell, *every* instance of such an issue has been addressed promptly and thoroughly. To me, those are the more important issues to be raised about acceptance of a package into the HP. MIchael

Another point I would like to make is that unless I'm mistaken, even if text is accepted into the platform, that doesn't mean that maintainership of the library is assigned to libraries@haskell.org: it stays with Bryan. Given that he's repeatedly stated that the API is the way that it is because that's the way he *wants* it to be, and he has a plausible rationale for this, this entire discussion is MOOT and we should immediately stop wasting time and move to a vote on accepting text as-is.
To me, this sounds very reasonable. For me consistency within one package is much more important than striving for ultimate consistency within the hp. And I absolutely agree with the point, that the author of a library is in charge to ensure that consistency. Just my two cents. Cheers, Simon

On Sun, Nov 7, 2010 at 7:54 PM, Gregory Collins
Myself and several other people have been following this discussion with increasing levels of annoyance and frustration. My understanding was that the HP process was intended to help with the overall design of libraries and to head off serious problems before too much time is wasted on discussion, NOT to devolve into extended megathreads over which colour to paint the bike shed.
This was at least my intention when I co-authored the package addition process document. I wanted to have a venue where experienced Haskell hackers could help improve the quality of libraries that went into the HP. I tried to model the process based on Python's PEP [1] process, which seems have worked for them for the most part. In my head, the process would help prevent bad design from getting into the HP, where it might be harder to fix due to a wider user base and higher stability expectations. To give an example: the creators of the Python WSGI spec decided that the WSGI [2] implementation should decode any URLs before passing them to the user. This turned out to be a bad decision as URLs are no ambiguous. For example, these two URLs http://www.foo.com/hi%27there http://www.foo.com/hi/there decode to the same URL and thus some information is lost, making the WSGI API slightly less useful (i.e. for people who need to distinguish between the two). I wanted a design review process that would help catch, among other things, such corner cases that the original library author might not have considered. 1. http://www.python.org/dev/peps/ 2. http://wsgi.org/wsgi/ Johan

On Sun, Nov 07, 2010 at 06:54:38PM +0100, Gregory Collins wrote:
of libraries and to head off serious problems before too much time is wasted on discussion, NOT to devolve into extended megathreads over
I think "extended megathreads" is something of an overexaggeration, but I have had to repeat myself (or at least reconfirm my opinion) a couple of times. First in response to "I believe that #3 is actually resolved, but I haven't deleted it pending confirmation from Ian or others.". Then, in the call for consensus, "Say nothing" means "you're prepared to accept the current proposal" so I needed to "Raise objection. Objections need accompanying reasoning". There were also a few messages from other people which made similar arguments, but I don't see how we can establish what the consensus is without several people giving their opinion.
My understanding was that the HP process was intended to help with the overall design [not] which colour to paint the bike shed.
I (perhaps not surprisingly) disagree with your characterisation of the naming discussion. We are not talking about whether a function should be named breakString, breakSubstring, breakStr, breakList, ... Rather, we are discussing a fundamental design decision for the platform: whether it is more important to have global consistency of HP packages, or to have each package have a locally optimal API. In this case, whether 2 functions in different packages should have the same name or not. (and also some side discussion about whether glocal consistency aplies in this case, and why the current API is locally optimal).
Another point I would like to make is that unless I'm mistaken, even if text is accepted into the platform, that doesn't mean that maintainership of the library is assigned to libraries@haskell.org: it
That is true, but I would hope that a change in a package's philosophy would be raised for discussion on the list by the maintainer or by the person bumping HP library versions before it is encorporated into the platform.
a quality basis compared to some of the libraries we grandfathered in,
I think that is an argument for improving the other libraries, not for opening the platform floodgates. Thanks Ian

Hi Ian,
On Sun, Nov 7, 2010 at 10:43 PM, Ian Lynagh
Then, in the call for consensus, "Say nothing" means "you're prepared to accept the current proposal" so I needed to "Raise objection. Objections need accompanying reasoning".
You would vote no if this minor naming issue (again, to which both sides have reasonable arguments IMO) wasn't resolved to your liking? That's the question on the table as far as I'm concerned. Because I have a feeling that Bryan is close to taking his ball and going home, and if I were in his shoes I'm not sure I'd feel differently.
which colour to paint the bike shed.
I (perhaps not surprisingly) disagree with your characterisation of the naming discussion.
:)
We are not talking about whether a function should be named breakString, breakSubstring, breakStr, breakList, ...
Rather, we are discussing a fundamental design decision for the platform: whether it is more important to have global consistency of HP packages, or to have each package have a locally optimal API. In this case, whether 2 functions in different packages should have the same name or not.
I'm not saying I don't see your point -- I'm saying that Bryan clearly doesn't want to make this change, he has fairly good reasons for wanting to keep his library the way it is, nobody seems to be respecting his wishes in the matter, and ultimately it's my feeling that the issue is not important enough to warrant this much extended discussion. He's spent umpteen hours on this project and this stylistic question is something which clearly falls under "maintainer's prerogative."
Another point I would like to make is that unless I'm mistaken, even if text is accepted into the platform, that doesn't mean that maintainership of the library is assigned to libraries@haskell.org: it
That is true, but I would hope that a change in a package's philosophy would be raised for discussion on the list by the maintainer or by the person bumping HP library versions before it is encorporated into the platform.
Again: there's been more than enough discussion in my opinion, Bryan has explained his rationale and made his feelings on the matter clear. Are we going to vote no over this?
a quality basis compared to some of the libraries we grandfathered in,
I think that is an argument for improving the other libraries, not for opening the platform floodgates.
You could make a similar argument about how you want a pony: the fact
is that the labour pool is not there to improve the other libraries.
Bryan has dropped a shining diamond in our laps and we're quibbling
that the jeweller has given us a Mazarin cut instead of a Peruzzi.
Given how well-written it is, especially relative to our baseline,
characterizing accepting text as-is as "opening the floodgates" isn't
fair. In my opinion. :)
G
--
Gregory Collins

On November 7, 2010 12:54:38 Gregory Collins wrote:
Another point I would like to make is that unless I'm mistaken, even if text is accepted into the platform, that doesn't mean that maintainership of the library is assigned to libraries@haskell.org: it stays with Bryan. Given that he's repeatedly stated that the API is the way that it is because that's the way he *wants* it to be, and he has a plausible rationale for this, this entire discussion is MOOT and we should immediately stop wasting time and move to a vote on accepting text as-is.
+1 As Greg pointed out, it really sounds like Bryan likes his names and is not offering to create/maintain an alternatively named version, so the outcome of this name discussion is not prerequisite for voting on accepting his package. Not saying that this whole discussion about the pro's and con's of various naming schemes is not potentially valuable, just that it doesn't require an resolution to get on with voting on the proposed Data.Text addition. Further, even if he was offering to change the names, it seems that the community is about evenly split, and, in that case, it seems only fair to go with the author's preference out of respect to the fact that he did the work. Cheers! -Tyson PS: Am I even suppose to be voting? The more people that get to vote, the more we can be assured to never get better than average (yay committees). I believe one of Linux's strong advantages is having Linus helps avoid this.

On 07/11/2010, at 14:36, Duncan Coutts wrote:
There is essentially just one point of contention, over about 10 out of the 80+ functions in the Data.Text module. The issue is about which functions should get the nice names and about consistency between modules.
The issue is whether we expect packages in the platform to follow established conventions. This is a rather fundamental design question and it would be nice if it could be resolved one way or another. The text package has merely highlighted the problem; I'm sure it will come up with other libraries as well. Accepting text as is does resolve this question by default which perhaps explains why some insist on discussing it. The text package is not really the issue - the fundamental design question is. Dismissing it as a naming issue isn't really helpful. It's really something that should have been decided independently of any particular package. To show what I mean by established conventions, here is the result of a quick search on hackage for occurences of break with similar semantics to text. Packages where the first argument is a predicate (in the order that Hayoo shows them): haskell98/base bytestring containers (as breakl and breakr in Data.Sequence) vector utf8-string enumerator Stream utility-ht stream-fusion storablevector ListLike iteratee heap compact-string/compact-string-fix heaps container-classes Packages where the first argument is something else: text I apologise in advance if I missed any.
Option 3 --------
breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text)
There are several *By functions in text. Are you proposing to rename them all to *Chr? Disclaimer: This post does not express any preference one way or another on the author's part, in the hope of avoiding any bikeshedding, lynching or slinging arrows. Roman

On Mon, Nov 8, 2010 at 12:09 AM, Roman Leshchinskiy
Packages where the first argument is a predicate (in the order that Hayoo shows them):
[...super-long list...]
Packages where the first argument is something else:
text
Bryan is obviously not unaware of the prior art and has explained his
rationale. There is a crucial distinction between those other examples
and text, namely that those other containers are intended to work
element-wise and text isn't.
I fear I am contributing to the bureaucratic paper-shuffling I've been
complaining about -- I think my vote is clear and I'll be bowing out
of the discussion now.
G
--
Gregory Collins

As a platform user and library developer, I'd rather that high-quality libraries like text be included in the platform than not, even if the naming conventions are slightly different. It isn't that hard to learn the naming quirks of various libraries, and it means I can rely on the library being present in a much wider collection of haskell installations. Given the distributed nature of the development of the Haskell libraries that go into the platform, they are never going to be as coherent as they might be if a small team of people wrote all of them. But then, the scope would be much smaller and the usefulness correspondingly less. Just my two cents. Kathleen On Nov 7, 2010, at 3:24 PM, Gregory Collins wrote:
On Mon, Nov 8, 2010 at 12:09 AM, Roman Leshchinskiy
wrote: Packages where the first argument is a predicate (in the order that Hayoo shows them):
[...super-long list...]
Packages where the first argument is something else:
text
Bryan is obviously not unaware of the prior art and has explained his rationale. There is a crucial distinction between those other examples and text, namely that those other containers are intended to work element-wise and text isn't.
I fear I am contributing to the bureaucratic paper-shuffling I've been complaining about -- I think my vote is clear and I'll be bowing out of the discussion now.
G -- Gregory Collins
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On 11/7/10 6:34 PM, Kathleen Fisher wrote:
As a platform user and library developer, I'd rather that high-quality libraries like text be included in the platform
than not, even if the naming conventions are slightly different. It isn't that hard to learn the naming quirks of various libraries, and it means I can rely on the library being present in a much wider collection of haskell installations. Given the distributed nature of the development of the Haskell libraries that go into the platform, they are never going to be as coherent as they might be if a small team of people wrote all of them. But then, the scope would be much smaller and the usefulness correspondingly less.
For the record, I agree with this position regarding the meta-issue of naming conventions in the HP. While consistency is certainly a desirable goal, it is not the be-all and end-all of goals nor of naming issues. The problem of handling textual information with String is notorious, both within the community and outside of it. The lack of a clear default(!) alternative to String ends up supporting the lingering rumors that functional programming can never be as efficient as $ALGOL_BASED_LANG, which ends up harming the community as a whole. The work on ByteString was an immense step forward and has been widely embraced and blessed, but its Word8-based organization means that it is not a complete solution to the problem of properly handling textual information. The text library offers such a solution and a high-quality solution at that; it is certainly on par with ByteString, IMO. I am of the opinion that adding text to the HP and encouraging it to be widely adopted is the only sensible solution to this larger issue of correct and efficient handling of textual information. Could a different internal representation have been used? Sure. Would it be more performant? Unclear. Could the functions have been named differently? Sure. Would that be an unequivocal improvement? Unclear. Is it a high-quality library? Yes. Is it widely used? Yes. Does it fill a gap in the core libraries offered by HP? Yes. With trac.haskell.org being down I can't check what other questions are listed for what should be asked of new packages, but I think the answer to whether text should be included in HP is clear, and the uncertain possibility of coming up with more parsimonious names is not enough to dislodge the rest of the reasoning for its inclusion. -- Live well, ~wren

On 11/07/10 20:00, wren ng thornton wrote:
...The work on ByteString was an immense step forward and has been widely embraced and blessed, but its Word8-based organization means that it is not a complete solution to the problem of properly handling textual information. The text library offers such a solution and a high-quality solution at that; it is certainly on par with ByteString, IMO. I am of the opinion that adding text to the HP and encouraging it to be widely adopted is the only sensible solution to this larger issue of correct and efficient handling of textual information.
I'll use this as a jumping-off point for a doubt I have (Outside of my steering-committee role. Duncan and Thomas did a fine job of steering less than 24 hours ago.) ==========Intro========= Perhaps the API in Data.Text http://hackage.haskell.org/packages/archive/text/0.10.0.0/doc/html/Data-Text... is actually still too list-like and un-Unicode-ish. Functions like justifyRight/justifyLeft/center really only make sense for strings where one Char = one grapheme, and monospace fonts. (At least given the current implementation of those functions). And some more (see below "Opinions" section, which is the most important section of this email -- or at least the section I need responses to). My feeling is that these functions should be included, but not in the base 'Data.Text' module -- perhaps in 'Data.Text.Char' or such. So that we don't regret it later. Sort of like how we might regret 'lines'/'words' standing out as the only element-type-specific function in Data.List. Or how people use String-based and unsafeInterleaveIO-ish System.IO.readFile because it's in a standard place, without thinking of why they not want those properties. With 'text', people will try character based things like 'Data.Text.map toUpper' or 'Data.Text.length' from parallelism with their List experience, without even stopping to look at the Text documentation*, and never know it might be a bad choice for unicode text... Yes, it's no worse than doing those things on a String a.k.a. [Char], but we should do better than that. *go look at the Data.Text "Case conversion" haddocks right now, and choose Data.Text.toUpper! (After having explored this doubt for myself, I decided I didn't have any of the other doubts about including 'text' in the Platform anymore, see section "I approve the rest of 'text'".) =========Unicode musing=========== As someone (Ian?) pointed out, even some substring-ing operations can produce peculiar results when they split in the middle of a logical unit (including the case of combining characters, a` vs. à, but keep in mind: not all languages have a NFC combined version for everything they use, and some languages have logical groupings in ways beyond just combining-characters). The Unicode technical report on regexes is pretty representative of how complex it is. http://www.unicode.org/reports/tr18/ Ideally we would document each function with references to relevant Unicode reports, documenting the behavioural tradeoffs we made between implementation and interface simplicity, and textual meaningfulness. That regexes document, for example, has three levels of conformance; a couple of the Level 2 principles, 2.1 Canonical Equivalents and 2.2 Extended Grapheme Clusters, are relevant even for literal string search ("break"/"find"). I think, from a cursory inspection of the code (Data.Text.Search.indices), that we don't meet 2.1 or 2.2. 2.1 could be met e.g. by the search function itself doing normalizing, or by clearly documenting the need to normalize beforehand. 2.2 would require a separate search mode/function to say you only want to split at complete "extended grapheme clusters" and a more complicated implementation (it is quite plausible that core 'text' library would not provide this, but ideally the docs would point to a library that would -- perhaps that's 'text-icu', also maintained by Bryan O'Sullivan, which is bindings to the C library of that name: http://hackage.haskell.org/package/text-icu ). It's also worth noting in the docs that the Ord instance is purely lexicographical on the codepoints, and is not an algorithm to collate for layman human consumption. (It may be obvious if you think about it. We just ought to remind people to think about it.) ==========Opinions on the Data.Text functions========= "Yes", "no", and "maybe" sections in that order. Disagree if you disagree. ===="Yes" -- Text-based functions==== In my opinion the following definitely makes sense for Data.Text: pack, unpack, empty, append, null, intercalate, replace, toCaseFold, toLower, toUpper, concat, strip, stripStart, stripEnd, break(aka breakSubstring), breakEnd, group(?), split, lines, words, unlines, unwords, isPrefixOf, isSuffixOf, isInfixOf, stripPrefix, stripSuffix, find(aka breaks/breakSubstrings), count ===="No" -- Splitting-by-codepoint functions==== In my opinion, because they take apart a piece of text code-point by code-point (a.k.a. Char by Char) or similar, the following should go in their own module: uncons, (unsnoc (except it doesn't exist)), head, last, tail, init, length, compareLength, map, intersperse, transpose, reverse, justifyRight, justifyLeft, center, fold*, concatMap, maximum, minimum, scan*, mapAccumL/R, take, drop, splitAt, inits, tails, chunksOf, zip, zipWith; (possibly) index and findIndex. (in fact some piece-by-piece code even ought to encode to UTF-something and analyze byte-by-byte. And on the flip side, of course some will need to analyze by larger logical units than Chars, for which these functions also are not suitable. I'm guessing usually these code-point-based functions are mainly only useful either for implementing higher-level Text functions, or when you know something that limits the possible text you could be dealing with, e.g. if you're writing an ASCII game like Angband - though beware still of future developments - some modern Angbanders are probably coding 'á' for ant-with-a-fedora already, etc.!) ===="Maybe" -- Somewhat codepoint based functions==== And I'm not sure about these: These create a new Text from Chars, so they're structurally sound. singleton, cons, snoc, unfoldr, unfoldrN (results = Text) These merely search in a Char-based way, (somewhat subjectively separated from splitting-by-codepoint functions -- I guess I think these are more likely to be useful / less likely to be abused) any, all (results = Bool) takeWhile, dropWhile, dropWhileEnd, dropAround, spanBy(aka span), breakBy, splitBy, findBy, partitionBy (results = Texts) groupBy(??), filter(?) (hmm) So perhaps (assuming they're commonly used enough to warrant remaining), segregate them in a different section in the Data.Text documentation. Or find some way to mark them. Or more likely we don't even need to do that -- just mention it (a small caution) in the Data.Text module header, since you can easily see by the presence of Char in the type -- in fact whether the 'Char' is in a contravariant position (f::Char->x) or covariant position (f::(Char->x)->y) tells you whether it is creation, or searching, respectively (If the splitting-by-codepoint functions remain, it's a bit more complicated). ====================I approve the rest of 'text'=============== generally: Everything else I see about the API (meaning all modules 'text' exports) looks like the state of the art. (i.e. we might have something better 3-5 years down the road with improved compiler/language and a billion other things, but that's life. For example, the type-level difference between the strict-Text world and the lazy-Text world might not be the most fun thing in the world sometimes, but anything else we Haskellers have come up with in the past few years has more problems / less wondrousness.). documentation: I don't think we need to perfect the documentation before it is accepted into the Platform, because we can do that later (heck, I volunteer to do the work if no one else wants to). And by 'perfect', I do mean largely 'warn the user how it can go wrong, and give corresponding advice' (perhaps using example). The docs are already quite high quality. list/bytestring/text parity: I've been convinced by now not to let the List/Bytestring/Text parity issues hold us up. Having a text lib is more important than achieving perfection now, and here's the reasons I think this particular perfection should be put off for a while (and if this means indefinitely then alright): * There are already lots of users of Text - it's not cost-free to break API now either. * We're not ready to do a super-thought-out renaming. First I would wish the subsequence-based (as contrasted with element-based) functions to be proposed and accepted into Data.List / Data.Bytestring as appropriate, or at least have very concrete proposals. If we were very much in the mood to do it, we could, but relating to both Bryan and the community at this juncture, we're (IMHO) not. (Neither the beautiful path forwards, nor the choice to choose it if it fully materializes, is crystal clear.) * Incidentally, I suspect that separating functions into two modules, as I advocate in this email, is actually a less difficult API breakage for clients to fix than a renaming of several functions is. ================ Conclusion ============== So please comment on my hare-brained idea of separating some of the Data.Text functions into a separate module. -Isaac

On Sun, Nov 7, 2010 at 06:36, Duncan Coutts
Option 1 (current Text lib design) ----------------------------------
break :: Text -> Text -> (Text, Text) breakBy :: (Char -> Bool) -> Text -> (Text, Text)
This gives the short name 'break' to the substring version, and the longer name 'breakBy' to the character predicate version.
The argument for doing this is that the substring version should be the common encouraged one and so it should get the nice name.
The argument against is that this is inconsistent with the List library which gives the name 'break' to the element predicate version:
break :: (a -> Bool) -> [a] -> ([a], [a])
+1
IMO, the signatures/names of 'break' and 'breakBy' for text ought to
remain as they are, because they are they are short and accurate
descriptions of the functions behavior. Data.List.break should have
been named 'breakBy'; merely because a older API is poorly designed is
no reason to mangle new libraries out of misplaced nostalgia.
On Sun, Nov 7, 2010 at 08:56, Malcolm Wallace
Option 1 (current Text lib design) ----------------------------------
break :: Text -> Text -> (Text, Text) breakBy :: (Char -> Bool) -> Text -> (Text, Text)
This gives the short name 'break' to the substring version, and the longer name 'breakBy' to the character predicate version.
I very much dislike this option, for the reasons already rehearsed. And Ian presented some evidence that there are no (or very few) usages of the short name versions in client libraries on hackage. But this raw data does not give us enough context to know how to interpret it. Are there simply very few clients of the Text package at all so far? Are there many clients but they use other parts of the API? How many usages of the long name versions appear? And that does not address the idea that perhaps the clients indeed ought to be using the substring versions, but for whatever reason have not done so.
If a count of client packages is of value, my packages make up over a quarter of text's clients[1]. Besides the packages uploaded to Hackage, I have also written various scripts and special-purpose libs which use a broader set of text's API (including both 'break' and 'breakBy'). At no point have I felt requiring an extra two characters for the more special-purpose case to be an imposition or annoyance. 'breakStr' and 'breakChr' are the worst of all choices -- longer than *both* current names, with gratuitous abbreviation to boot. [1] According to the reverse-dependency-enabled Hackage, http://bifunctor.homelinux.net/~roel/cgi-bin/hackage-scripts/package/text
participants (19)
-
Bryan O'Sullivan
-
Duncan Coutts
-
Edward Kmett
-
Gregory Collins
-
Ian Lynagh
-
Isaac Dupree
-
Jeremy Shaw
-
Johan Tibell
-
John Millikin
-
John Smith
-
Kathleen Fisher
-
Malcolm Wallace
-
Michael Snoyman
-
Roman Leshchinskiy
-
Ross Paterson
-
Simon Hengel
-
Sterling Clover
-
Tyson Whitehead
-
wren ng thornton