
On 2005-01-07, Simon Marlow
- Can you use (some encoding of) Unicode for your Haskell source files? I don't think this is true in any Haskell compiler right now.
I assume this won't be be done until the next one is done...
- Can you do String I/O in some encoding of Unicode? No Haskell compiler has support for this yet, and there are design decisions to be made. Some progress has been made on an experimental prototype (see recent discussion on this list).
Many of the easy ways to do this that I've heard proposed make the current hacks for binary IO fail. IMHO, we really, really, need a standard, supported way to do binary IO. If I can read in and output octets, then I can implement unicode handling on top of that. In fact it would let a bunch of the proposed ideas for unicode support can be implemented in pure haskell and have API details hashed out and polished. For unix, there are couple different tacks one could take. The locale system is standard, and does work, but is ugly and a pain to work with. In particular, it's another (set of) global variables. And what do you do with a character not expressible in the current locale? I'd like to possibility of different character sets for different files, for example. I suppose I wouldn't be too upset at using the locale information, but defaulting to UTF-8, rather than ASCII for unset character set information. For win32, I really don't know the options.
- What about Unicode FilePaths? This was discussed a few months ago on the haskell(-cafe) list, no support yet in any compiler.
This is tricky, because most systems don't have such a thing terribly standard. For win32, it is standardized and should be wrappable fairly easily, but I don't know that I'd want to base my model on that. For unix, again, there is the locale system, with, again, the problem of unrepresentable characters. Traditionally systems have essentially said "file names are zero-terminated strings of bytes that may not contain character 47, which is used to seperate directory names", and the interpretation as a matter of _names_ and _characters_ was entirely a matter up to the terminals (or graphical programs, eventually) for display and programs for manipulation. -- Aaron Denney -><-