
Dimitry Golubovsky wrote:
I have tried to send a string of Unicode characters over a socket (or to write it into a file handle). The result is strange: it looks like characters are truncated down to their least significant bytes.
Yep.
Honestly, I expected that 20 bytes were sent (or something smaller if they were sent in UTF), and "Received" be identical to "Source was". The last string of output is just to check whether those are indeed lower bytes shown, not some garbage.
I am using a binary distribution of GHC 6.0 on Linux - are there any special conditions I have to enable for the source distribution to be able to send/receive Unicode characters?
No, it just isn't supported. All of the Haskell I/O functions take the bottom octet and discard the top bits.
To be more general: how would I send arbitrary binary data (stream of octets) over a socket or a file handle? Should I always assume that only lower bytes would be sent, and this will be forever in ghc?
Yes. Well, maybe not forever, but for the forseeable future.
Or is it a bug?
No. It's just a fundamental design flaw in Haskell. Presumably someone thought that wide-character support was just a question of defining Char, and forgot about a minor issue called "I/O".
The problem is, Handle/Socket functions require a String to be the type of data to exchange; not a, say [Int8]. Therefore, I need to be able to coerce my binary data buffer to a String.
Correct. IOW, lots of messing around with ord and chr and either
mod/div or the Bits library.
--
Glynn Clements