> I am writing an HTTP client-side library, using the
> SocketPrim library.
> During the implementation of Base64 encode/decode I began to
> have some
> doubts over the use of the Char type for socket I/O.
>
> As far as I can tell, "sendTo" and "recvFrom" are simply
> handles on the
> underlying OS calls. My winsock2.h file tells me the data
> passed into and
> received from these functions are C style chars, 8 bits each.
> In unix these
> functions (sys/sockets.h) appear to use a C void pointer.
> Finally I notice
> that the Haskell98 report defines Haskell Char as a Unicode
> char (which I
> figure isn't guaranteed 8 bits).
>
> So I am curious, what happens when I send these unicode
> Haskell chars to the
> SocketPrim.sendTo function? My current guess is that the low
> 8 bits of each
> Char become a C style char.
That's what happens in GHC, but as you correctly point out this is
wrong. The real problem is twofold: GHC doesn't have proper support for
Unicode encoding/decoding at the I/O interface, and we don't have an
"approved" way to write other kinds of data into a file.
In GHC we currently have this interface (not in a released version yet)
in the module Data.Array.IO:
hGetArray :: Handle -> IOUArray Int Word8 -> Int -> IO Int
hPutArray :: Handle -> IOUArray Int Word8 -> Int -> IO ()
and there's also a Binary I/O library I've been working on that can be
used for reading/writing Word8s and other low-level kinds of data in a
predictable way (I guess we should discuss what the interface to
Data.Binary should look like at some point).
I'd also be happy to add
hPutWord8 :: Handle -> Word8 -> IO ()
hGetWord8 :: Handle -> IO Word8
to System.IO if people think this is the right thing to do.
Cheers,
Simon