Re: CWString API

30 Nov 2004

      On Tue, Nov 30, 2004 at 02:40:20AM -0800, Krasimir Angelov wrote:
...
--- John Meacham  wrote:
...
The problem is that these operations are very
unsafe, there is no
guarenteed isomorphism or even injection between
wchar_ts and Chars. If
people really know what they are doing, they can do
the conversion
themselves via fromIntegral/ord/chr, but I don't
think we should
encourage such unsafe usage with functions when it
is simple for the
user to work around it themselves.
As I understand castCWcharToChar is unsafe if the
language doesn't support unicode /* Char type is too
small */ and castCharToCWchar is unsafe if in the
target OS wchar_t has 16 bits while the language
supports unicode. In both cases String<->CWString
traslation is safe. When I have wchar_t in C then I
have two opportunities:
- map the type in Haskell to CWchar without any
conversion
  - use chr.fromIntegral or fromIntegral.ord
The first variant is more portable. Please correct me
if I am wrong.
The problem is that even if the language supports the full unicode
range, there is no guarentee that a single wchar_t maps (simply and in a
pure functional fashion) to a haskell Char. Just because wchar_t is 16
bits, it does not mean it represents a 16 bit subset of unicode,
regional systems may have specialized wchar_t's for their language
which are not unicode. The encoding of wchar_t is pretty much completely
unspecified, unless __STDC_ISO10646__ is defined, in which case it is
straight unicode and the casting routines could be defined simply (my
CWString library detects and optimizes this case.). The only common
system where this is the case is linux glibc based systems.
...
Are castCCharToChar and castCharToCChar deprecated? I
think castCharToCChar is unsafe when the language
supports Unicode.
These have never really been safe to use. char may have a completly
different encoding than Char which these won't honor. deprecated may not
be the proper word, but whenever possible one should use the higher
level conversion routines which behave properly in the current locale.
These should only be used when you have system or application specific
knowledge that CChar is always ASCII and not dependent on the current
locale.  

Note that in general, there will not ever be a guarenteed one-to-one
mapping between chars,wchar_ts and haskell Chars, so higher level
routines must work on strings rather than individual chars. 

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈