
On Tue, 2007-10-02 at 08:02 -0700, Deborah Goldsmith wrote:
On Oct 2, 2007, at 5:11 AM, ChrisK wrote:
Deborah Goldsmith wrote:
UTF-16 is the native encoding used for Cocoa, Java, ICU, and Carbon, and is what appears in the APIs for all of them. UTF-16 is also what's stored in the volume catalog on Mac disks. UTF-8 is only used in BSD APIs for backward compatibility. It's also used in plain text files (or XML or HTML), again for compatibility.
Deborah
On OS X, Cocoa and Carbon use Core Foundation, whose API does not have a one-true-encoding internally. Follow the rather long URL for details:
http://developer.apple.com/documentation/CoreFoundation/Conceptual/ CFStrings/index.html?http://developer.apple.com/documentation/ CoreFoundation/Conceptual/CFStrings/Articles/StringStorage.html#// apple_ref/doc/uid/20001179
I would vote for an API that not just hides the internal store, but allows different internal stores to be used in a mostly compatible way.
However, There is a UniChar typedef on OS X which is the same unsigned 16 bit integer as Java's JNI would use.
UTF-16 is the type used in all the APIs. Everything else is considered an encoding conversion.
CoreFoundation uses UTF-16 internally except when the string fits entirely in a single-byte legacy encoding like MacRoman or MacCyrillic. If any kind of Unicode processing needs to be done to the string, it is first coerced to UTF-16. If it weren't for backwards compatibility issues, I think we'd use UTF-16 all the time as the machinery for switching encodings adds complexity. I wouldn't advise it for a new library.
I would like to, again, strongly argue against sacrificing compatibility with Linux/BSD/etc. for the sake of compatibility with OS X or Windows. FFI bindings have to convert data formats in any case; Haskell shouldn't gratuitously break Linux support (or make life harder on Linux) just to support proprietary operating systems better. Now, if /independent of the details of MacOS X/, UTF-16 is better (objectively), it can be converted to anything by the FFI. But doing it the way Java or MacOS X or Win32 or anyone else does it, at the expense of Linux, I am strongly opposed to. jcc