
On Sat, Mar 24, 2012 at 5:33 PM, Freddie Manners
To add my tuppence-worth on this, addressed to no-one in particular:
(1) I think getting hung up on UTF-8 correctness is a distraction here. I can't imagine anyone suggesting that the C/C++ standards removed support for (char*) because it wasn't UTF-8 correct: sure, you'd recommend people use a different type when it matters, but the language standard itself shouldn't be driven by technical issues that don't affect most people most of the time. I'm sure it's good engineering practice to worry about these things, but the standard isn't there to encourage good engineering practice.
C++ does not consider 'char*' as the type of a string. It has a standard template std::basic_string that can be instantiated on char (giving std::string) or encoding type (of unicode characters) char16_t, char32_t, and wchar_t giving rise to u16string, u32string, and wstring. It has a large number of functions to manipulate a string as a sequence (Haskell's statu quo) or as a text thanks to an elaborated localization machinery. -- Gaby, back to lurking mode