
Hi, We try to make a program which write on stdout the UTF-8 character corresponding to an input unicode value. And this under MacOS X, Debian linux, and Windows. It seems to be easy, but we can't have found anything valuable, on the net. Would you mind to help us? Niko. Ps: excuse my poor english.

On Sunday, 2003-08-10, 19:27, CEST, Danon'. wrote:
Hi,
We try to make a program which write on stdout the UTF-8 character corresponding to an input unicode value.
UTF-8 encodes each unicode value as a stream of octets. So there are two mistakes in your sentence above: 1. You want to output octets (i.e., 8-bit words), not characters. (In Haskell 98, a character is always a Unicode code value, although, in practice, not all Haskell systems support Unicode.) 2. One character (i.e., Unicode code value) is not always converted to a single octet but often to a sequence of octets.
[...]
The main problem is that you need binary I/O. Haskell 98 only provides text I/O. Text I/O involves the use of an encoding which maps between the octets of the actual I/O stream and the characters Haskell sends or recieves. At least, Hugs and GHC seem to use Latin-1 as the encoding which just means that they map the octets 0 to 255 to the characters with Unicode codes 0 to 255. The other point with text I/O is that under Windows the EOF character ^Z is treated specially and a conversion between Windows EOLs (^M^J) and Haskell EOLs (^J) takes place. Hugs and GHC provide the function openFileEx which allows you to turn all these Windows-specific things off. So an easy way to read or write octets from/to a file might be to open the file via openFileEx and convert characters to octets via Char.ord or octets to characters via Char.chr, respectively. The conversion between characters and their UTF-8 encodings shouldn't be too difficult for you to implement yourself. Alternatively, you might want to look at http://sourceforge.net/projects/haskell-i18n/.
Niko.
[...]
Wolfgang

Dnia pon 11. sierpnia 2003 00:49, Wolfgang Jeltsch napisał:
The main problem is that you need binary I/O. Haskell 98 only provides text I/O.
You don't need binary I/O for UTF-8 now; because implementations use ISO-8859-1, UTF-8 octets can be faked as characters by (chr . fromIntegral).
The other point with text I/O is that under Windows the EOF character ^Z is treated specially and a conversion between Windows EOLs (^M^J) and Haskell EOLs (^J) takes place.
UTF-8 preserves ASCII and doesn't use ASCII bytes for non-ASCII characters, so the situation is the same as in other encodings and text mode is usually fine. It would not be OK for UTF-16. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

Marcin 'Qrczak' Kowalczyk
Dnia pon 11. sierpnia 2003 00:49, Wolfgang Jeltsch napisał:
The main problem is that you need binary I/O. Haskell 98 only provides text I/O.
You don't need binary I/O for UTF-8 now; because implementations use ISO-8859-1, UTF-8 octets can be faked as characters by (chr . fromIntegral).
I wonder, Would it cause a lot of compatibility trouble to wrap IO functions in a class class IOData a where readFile :: FilePath -> a : and do instance IOData Char where ... instance IOData Word8 where ... (Defaulting to Char the same way as Integer) Would this let us start writing byte-based IO without sacrificing compatibility or designing specific interfaces for it? Perhaps one could even do record-based IO by declaring instantiation IOData for custom data structures? -kzm -- If I haven't seen further, it is by standing in the footprints of giants
participants (4)
-
danon@epita.fr
-
ketil@ii.uib.no
-
Marcin 'Qrczak' Kowalczyk
-
Wolfgang Jeltsch