
Thomas and I would like to summarise the current point of contention in the text library proposal with the aim of resolving the issue and getting the package accepted. It seems clear that we all want the package accepted, the disagreement is over details of the API. The problem here is not the amount of work to make the changes some people have been suggesting, the problem is disagreement over whether change is necessary and if so what change. There is essentially just one point of contention, over about 10 out of the 80+ functions in the Data.Text module. The issue is about which functions should get the nice names and about consistency between modules. (There is one other minor issue that Ross raised but we will deal with the most substantive issue first) There are two axes in which Text functions are generalised: * character predicate (e.g. searching for first char matching a predicate) * substring (e.g. searching for a substring) These are orthogonal directions of generalisation. There is no simple way to encompass both (regular expressions are not simple, naive generalisations cannot be implemented efficiently). The fact that there are these two forms of most functions is different from the List library which only has the element predicate direction, not the sub-sequence direction. This is the prelude to the problem, because the List library has already taken the common names for the character predicate versions. The design of the Text library encourages the use of substring operations because these are expected to be more commonly used and because correct handling of Unicode often requires substring operations (due to issues with combining characters). There are a number of options. To illustrate them let us pick an example function that breaks a text into two. There are two versions: * break based on a character predicate * break based on a substring Option 1 (current Text lib design) ---------------------------------- break :: Text -> Text -> (Text, Text) breakBy :: (Char -> Bool) -> Text -> (Text, Text) This gives the short name 'break' to the substring version, and the longer name 'breakBy' to the character predicate version. The argument for doing this is that the substring version should be the common encouraged one and so it should get the nice name. The argument against is that this is inconsistent with the List library which gives the name 'break' to the element predicate version: break :: (a -> Bool) -> [a] -> ([a], [a]) Option 2 -------- breakSubstring :: Text -> Text -> (Text, Text) break :: (Char -> Bool) -> Text -> (Text, Text) This gives the short name 'break' to the character predicate version and the longer 'breakSubstring' to the substring version. The argument for doing this is that it is consistent with the List library in its use of the name 'break'. The argument against is that the short name is now given to the version that is discouraged, and the version that is encouraged now has a very long and ugly name: this API is encouraging users to make the wrong choices. Decisions --------- There appears to be no consensus over which of these two options to pick. If this situation persists then the default position is for the package not to be accepted at all. We think there is consensus that the package should go into the platform in some form -- that the worst of all the options is for the package not to go in a all. We are now at the third stage of the consensus protocol. At this stage discussion should be limited to resolving one concern at a time. Anyone may contribute to this discussion. The people required to take part are the proposal author (Don) and anyone who has concerns with the package going in as is. People with concerns should restate those concerns and if necessary questions should be asked to clarify the concerns. In particular, if the summary above is not an accurate expression of peoples concerns then they should say so. Don will update the wiki proposal page with the details of the remaining concerns (or simply the summary above if this is accurate). The steering committee (in this case Thomas and I) will follow the discussion. If we are still stuck in one week (14th Nov) then the steering committee will re-evaluate the situation. To kick off the discussion focused on this narrow issue, Thomas and I would like to suggest a 3rd alternative option: Option 3 -------- breakStr :: Text -> Text -> (Text, Text) breakChr :: (Char -> Bool) -> Text -> (Text, Text) This give neither version the short name 'break', but gives both reasonably short names with a suffix to indicate the character predicate vs substring. This addresses the complaint that a name from the List library is being used but with an inconsistent type (because the name is not being used at all). It removes the problem that the character predicate versions are being promoted over the substring versions by the use of the shorter names. It makes explicit the fact that all the functions come in two forms, whereas with List there is just one form. There is still a strong connection with the List library by using the same root names. So people can still carry over their experience of the List library API to help them find the right Text functions. They will have to make one choice between the character predicate and substring versions, which is reasonable given that the substring versions are preferred. Duncan & Thomas (with their platform steering committee hats on)