
19 Apr
2006
19 Apr
'06
6:47 p.m.
John Meacham
I'd recommend just always using utf8 under the hood
Or have two cases of the representation: an array of bytes if every character is U+00FF or below, or an array of 32-bit words otherwise. My Kogut compiler does that internally, and makes use of that when interfacing with C (the default encoding is assumed to preserve ASCII; if the narrow case has only ASCII chars, a pointer to its internals is passed to a C function; for this reason it has an otherwise unused '\0' after the last character). AFAIK CLISP has 16-bit words as the third case. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/