
Greetings, I am having trouble sending unicode characters as utf8 over a socket handle. Despite setting the encoding on the socket handle to utf8, it still seems to use some other encoding when writing to the socket. It works correctly when writing to stdout, but not to a socket handle. I am using ghc 6.12.1 and network-2.2.1.7. I can get it to work using System.IO.UTF8, but I was under the impression this was no longer necessary? I also don't seem to understand the interaction between hSetEncoding and hSetBinaryMode because if I set the binary mode to 'False' and the encoding to utf8 on the socket, then when writing to the socket the string seems to be truncated at the first non-ascii codepoint. Here is a test snippet, which can be used with netcat as a listening server (ie. nc -l 1234).
import System.IO import Network main = do let a="λ" s <- connectTo "127.0.0.1" (PortNumber 1234) hSetEncoding s utf8 hSetEncoding stdout utf8 hPutStrLn s a putStrLn a hClose s
Thanks, David

On 12/05/2010 01:56, David Powell wrote:
Greetings,
I am having trouble sending unicode characters as utf8 over a socket handle. Despite setting the encoding on the socket handle to utf8, it still seems to use some other encoding when writing to the socket. It works correctly when writing to stdout, but not to a socket handle. I am using ghc 6.12.1 and network-2.2.1.7. I can get it to work using System.IO.UTF8, but I was under the impression this was no longer necessary?
I also don't seem to understand the interaction between hSetEncoding and hSetBinaryMode because if I set the binary mode to 'False' and the encoding to utf8 on the socket, then when writing to the socket the string seems to be truncated at the first non-ascii codepoint.
Here is a test snippet, which can be used with netcat as a listening server (ie. nc -l 1234).
import System.IO import Network main = do let a="λ" s <- connectTo "127.0.0.1" (PortNumber 1234) hSetEncoding s utf8 hSetEncoding stdout utf8 hPutStrLn s a putStrLn a hClose s
You've found a bug, thanks. The bug is that a socket is bidirectional and we're only setting the encoding for one side (the read side) but we should be setting it for both sides. I just created a ticket: http://hackage.haskell.org/trac/ghc/ticket/4066 Expect a fix in GHC 6.12.3. In the meantime you can work around it, e.g. this worked for me to create a write-only socket that hSetEncoding works with: connectTo hostname (PortNumber port) = do proto <- getProtocolNumber "tcp" bracketOnError (socket AF_INET Stream proto) (sClose) -- only done if there's an error (\sock -> do he <- getHostByName hostname connect sock (SockAddrInet port (hostAddress he)) socketToHandle sock WriteMode ) Cheers, Simon
participants (2)
-
David Powell
-
Simon Marlow