
On Sunday, 2003-08-10, 19:27, CEST, Danon'. wrote:
Hi,
We try to make a program which write on stdout the UTF-8 character corresponding to an input unicode value.
UTF-8 encodes each unicode value as a stream of octets. So there are two mistakes in your sentence above: 1. You want to output octets (i.e., 8-bit words), not characters. (In Haskell 98, a character is always a Unicode code value, although, in practice, not all Haskell systems support Unicode.) 2. One character (i.e., Unicode code value) is not always converted to a single octet but often to a sequence of octets.
[...]
The main problem is that you need binary I/O. Haskell 98 only provides text I/O. Text I/O involves the use of an encoding which maps between the octets of the actual I/O stream and the characters Haskell sends or recieves. At least, Hugs and GHC seem to use Latin-1 as the encoding which just means that they map the octets 0 to 255 to the characters with Unicode codes 0 to 255. The other point with text I/O is that under Windows the EOF character ^Z is treated specially and a conversion between Windows EOLs (^M^J) and Haskell EOLs (^J) takes place. Hugs and GHC provide the function openFileEx which allows you to turn all these Windows-specific things off. So an easy way to read or write octets from/to a file might be to open the file via openFileEx and convert characters to octets via Char.ord or octets to characters via Char.chr, respectively. The conversion between characters and their UTF-8 encodings shouldn't be too difficult for you to implement yourself. Alternatively, you might want to look at http://sourceforge.net/projects/haskell-i18n/.
Niko.
[...]
Wolfgang