
Whilst I appreciate the topic of show is not directly related to GHC, what I would like to know is how to handle UNICODE properly... If I assume I have a good unicode terminal, so stdin and stdout are in unicode format, and all my text files are in unicode, how do I deal with this properly in GHC... what is the current state of affairs? Regards, Keean.

On Fri, Dec 19, 2003 at 04:51:50PM +0000, MR K P SCHUPKE wrote:
Whilst I appreciate the topic of show is not directly related to GHC, what I would like to know is how to handle UNICODE properly... If I assume I have a good unicode terminal, so stdin and stdout are in unicode format, and all my text files are in unicode, how do I deal with this properly in GHC... what is the current state of affairs?
I use unicode quite regularly with GHC. The only way in which unicode is lacking, is in the default I/O implementation in the standard libraries. There is nothing about the compiler itself that inherently limits unicode support. All that is needed is to replace or augment the standard IO code, a task which is often necisarry for other reasons when working on large projects anyway. here are a number of things I have done to make unicode easier to deal with: 1. written the CWString library (now a part of the FFI) which lets you call arbitrary C functions doing all the proper character set conversion stuff. 2. used UTF8.hs to wrap the various routines in IO. works great as long as your system uses utf8. (which many do) 3. modified daan's PPrint to be able to handle arbitrary character widths independent of the number of characters. this is useful when encoding things in a charset which doesn't have a 1-1 character to screen cell guarentee. (accents, CJK languages, etc..) and is also incidentlly very useful for doing things like embedding arbitrary escape sequences (colors) into pretty printed layout without affecting the PP algorithm. something I have wanted to do is modify Alex so that ∀ turns into the regular expression 0xe2 0x88 0x80 (and so forth) so that ghc (whose lexer is generated from alex) can simply accept utf8 input. John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

On Fri, Dec 19, 2003 at 12:17:42PM -0800, John Meacham wrote:
1. written the CWString library (now a part of the FFI) which lets you call arbitrary C functions doing all the proper character set conversion stuff.
Do you plan to update this and merge it with the hierarchical libraries to complete the implementation of the FFI spec?
participants (3)
-
John Meacham
-
MR K P SCHUPKE
-
Ross Paterson