
On Tue, Nov 30, 2004 at 02:40:20AM -0800, Krasimir Angelov wrote:
--- John Meacham
wrote: The problem is that these operations are very unsafe, there is no guarenteed isomorphism or even injection between wchar_ts and Chars. If people really know what they are doing, they can do the conversion themselves via fromIntegral/ord/chr, but I don't think we should encourage such unsafe usage with functions when it is simple for the user to work around it themselves.
As I understand castCWcharToChar is unsafe if the language doesn't support unicode /* Char type is too small */ and castCharToCWchar is unsafe if in the target OS wchar_t has 16 bits while the language supports unicode. In both cases String<->CWString traslation is safe. When I have wchar_t in C then I have two opportunities:
- map the type in Haskell to CWchar without any conversion - use chr.fromIntegral or fromIntegral.ord
The first variant is more portable. Please correct me if I am wrong.
The problem is that even if the language supports the full unicode range, there is no guarentee that a single wchar_t maps (simply and in a pure functional fashion) to a haskell Char. Just because wchar_t is 16 bits, it does not mean it represents a 16 bit subset of unicode, regional systems may have specialized wchar_t's for their language which are not unicode. The encoding of wchar_t is pretty much completely unspecified, unless __STDC_ISO10646__ is defined, in which case it is straight unicode and the casting routines could be defined simply (my CWString library detects and optimizes this case.). The only common system where this is the case is linux glibc based systems.
Are castCCharToChar and castCharToCChar deprecated? I think castCharToCChar is unsafe when the language supports Unicode.
These have never really been safe to use. char may have a completly different encoding than Char which these won't honor. deprecated may not be the proper word, but whenever possible one should use the higher level conversion routines which behave properly in the current locale. These should only be used when you have system or application specific knowledge that CChar is always ASCII and not dependent on the current locale. Note that in general, there will not ever be a guarenteed one-to-one mapping between chars,wchar_ts and haskell Chars, so higher level routines must work on strings rather than individual chars. John -- John Meacham - ⑆repetae.net⑆john⑈