
Hi, I have tried to send a string of Unicode characters over a socket (or to write it into a file handle). The result is strange: it looks like characters are truncated down to their least significant bytes. Here is my program (I am new in Haskell, so the code may look not so good, but it illustrates what I am trying to achieve) =========================================== module Main where import IO import Char import Control.Concurrent import Network.Socket -- A string of wide characters wide = [chr 0x1234, chr 0x5678, chr 0x4321, chr 0x8765, chr 0x102345] -- Function to "narrow" characters to their least significant byte narrow s = map (\c -> chr $ (ord c) `mod` 256) s -- Decode string printing all of its characters' order numbers strtodec str = foldr1 (++) ( map (\c -> " "++show(ord c)++" ") str) -- Server: receive a string and print it decoded, -- narrow it and print it again. -- It does not actually loop, though it could server=do sock <- socket AF_INET Stream 6 ia <- inet_addr "127.0.0.1" bindSocket sock $ SockAddrInet 2323 ia listen sock 15 srvloop sock where srvloop sock = do (acsock,from) <- accept sock instr <- recv acsock 128 putStrLn $ "Received: " ++(strtodec instr) putStrLn $ "Lower Bytes: " ++(strtodec $ narrow instr) sClose acsock sClose sock -- Client: send a string of wide characters client=do threadDelay 100 sock <- socket AF_INET Stream 6 ia <- inet_addr "127.0.0.1" connect sock $ SockAddrInet 2323 ia cnt <- send sock wide putStrLn $ "Sent "++(show cnt)++" bytes" putStrLn $ "Source was: "++(strtodec wide) sClose sock threadDelay 100 main=do forkIO (server) client =========================================== And here is its output =========================================== Sent 5 bytes Source was: 4660 22136 17185 34661 1057605 Received: 52 120 33 101 69 Lower Bytes: 52 120 33 101 69 =========================================== Honestly, I expected that 20 bytes were sent (or something smaller if they were sent in UTF), and "Received" be identical to "Source was". The last string of output is just to check whether those are indeed lower bytes shown, not some garbage. I am using a binary distribution of GHC 6.0 on Linux - are there any special conditions I have to enable for the source distribution to be able to send/receive Unicode characters? To be more general: how would I send arbitrary binary data (stream of octets) over a socket or a file handle? Should I always assume that only lower bytes would be sent, and this will be forever in ghc? Or is it a bug? The problem is, Handle/Socket functions require a String to be the type of data to exchange; not a, say [Int8]. Therefore, I need to be able to coerce my binary data buffer to a String. PS Of course, I could write my own socket functions, but I am looking for a more or less "pure" Haskell solution. PPS I tried this only with GHC. -- Dmitry M. Golubovsky South Lyon, MI