Re: [Haskell-cafe] Re: String vs ByteString

14 Aug 2010

      Quoth Brandon S Allbery KF8NH ,
...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 8/14/10 01:29 , Kevin Jardine wrote:
...
I think that this kind of programming detail should be handled
internally (even if necessary by switching automatically from UTF-8 to
UTF-16 depending upon the language).
It seems like the right thing, described in the wrong words - wouldn't
it be a more sensible ideal, to simply `switch' depending on the
character encoding?

I mean, to start with, you'd surely wish for some standardization,
so that the difference between UTF-8 and UTF-16 is essentially internal,
while you use the same API indifferently.

Second, a key requirement to effectively work with external data is
support for multiple character encodings.  E.g., if Text is internally
UTF-16, it still must be able to input and output UTF-8, and presumably
also UTF-16 where appropriate.

So given full support for _both_ encodings (for example, Text
implementation for `native' UTF-8), and support for input data of
_either_ encoding as encountered at run time ... then the internal
implementation choice should simply follow the external data.  For
Chinese inputs you'd be running UTF-16 functions, for French UTF-8.

	Donn Cave, donn@avvanta.com