
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/19/10 22:36 , wren ng thornton wrote:
<musing> I almost wonder if it would be worth it to define a new type, Character, which does correspond 1:1 to the human notion of a "character" (being intentionally vague about what exactly that means). Then we could have that Text is a vector/list/sequence of Characters, and give it the appropriate interface for being thought of that way.
I believe Perl 6 is going this way; while there is a single base type Str and role String, there are three different things it can "mean" (call them subtypes): bytes, Unicode code points, graphemes (the latter corresponding to the proposed Character). Or possibly only two of those; IIRC recently it was proposed that the byte version be moved to the already existing Buf type/Buffer role intended for binary data, roughly equivalent to ByteString. If a given string is accessed as code points, it can't then be treated as graphemes unless re-assigned to, and vice versa, but assigning it to another Str allows that Str to be accessed as graphemes instead. (I think. The Perl 6 spec is still a moving target, as evidenced by the thing about byte access; it's entirely possible that this changed again and I missed it. But there was definitely thought put into the distinction between bytes, codepoints, and graphemes.) - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAky+ecwACgkQIn7hlCsL25USGgCeOQZdx4PBCjc7yF0LwSRdyYEp E1IAniYszij4vGohwPtGOkB/weNB6TEF =NhB/ -----END PGP SIGNATURE-----