
On 10/19/10 7:59 PM, Ross Paterson wrote:
On Wed, Oct 20, 2010 at 12:35:33AM +0100, Duncan Coutts wrote:
On 19 October 2010 22:08, Roman Leshchinskiy
wrote: On 19/10/2010, at 15:22, John Lato wrote:
I think there's a significant difference between vector and text, namely a Vector is conceptually the same as a list/1D array, while a Text is not. I think this difference is enough to warrant a break from the list API.
Are you sure? From its interface Text looks exactly like a list of Chars to me.
Right, that's a very common misunderstanding of Unicode. A Unicode code point (type Char) does not correspond 1:1 with the human notion of a character. It would be nice if it did, but unfortunately it is not something we can ignore. Because of this it is better not to think of operations on individual Chars but on short sequences of Chars. In any case, when processing text (even ASCII where Chars do match characters) many of the most common operations that you want are substring not element based.
I believe Roman is referring to the Text API, which does indeed look a lot like the list API specialized to Char, with relatively few exceptions. The above would be an argument against including any of the functions with Char parameters, but a high proportion of them do.
<musing> I almost wonder if it would be worth it to define a new type, Character, which does correspond 1:1 to the human notion of a "character" (being intentionally vague about what exactly that means). Then we could have that Text is a vector/list/sequence of Characters, and give it the appropriate interface for being thought of that way. Of course, under the covers, Character would just be a newtype of Text[1] and so the bulk of text/text-icu implementation would need no changes. At least, it seems like that might make it possible for us to get out of this impasse about the text library matching vector/list/sequence APIs when Text is not a vector/list/array of Char. Also, it helps to codify what we mean by "a short sequence of Chars", which could possibly allow for some simplifying assumptions for the algorithms being used (since often there are better (X,X)->Y algos available when we know one of the X is much smaller than the other). </musing> [1] Using a type alias seems like it'd be too easy to break the API idealization. -- Live well, ~wren