
I am writing an HTTP client-side library, using the SocketPrim library. During the implementation of Base64 encode/decode I began to have some doubts over the use of the Char type for socket I/O.
As far as I can tell, "sendTo" and "recvFrom" are simply handles on the underlying OS calls. My winsock2.h file tells me the data passed into and received from these functions are C style chars, 8 bits each. In unix these functions (sys/sockets.h) appear to use a C void pointer. Finally I notice that the Haskell98 report defines Haskell Char as a Unicode char (which I figure isn't guaranteed 8 bits).
So I am curious, what happens when I send these unicode Haskell chars to the SocketPrim.sendTo function? My current guess is that the low 8 bits of each Char become a C style char.
That's what happens in GHC, but as you correctly point out this is wrong. The real problem is twofold: GHC doesn't have proper support for Unicode encoding/decoding at the I/O interface, and we don't have an "approved" way to write other kinds of data into a file. In GHC we currently have this interface (not in a released version yet) in the module Data.Array.IO: hGetArray :: Handle -> IOUArray Int Word8 -> Int -> IO Int hPutArray :: Handle -> IOUArray Int Word8 -> Int -> IO () and there's also a Binary I/O library I've been working on that can be used for reading/writing Word8s and other low-level kinds of data in a predictable way (I guess we should discuss what the interface to Data.Binary should look like at some point). I'd also be happy to add hPutWord8 :: Handle -> Word8 -> IO () hGetWord8 :: Handle -> IO Word8 to System.IO if people think this is the right thing to do. Cheers, Simon