
On Thu, Sep 13, 2007 at 12:23:33AM +0000, Aaron Denney wrote:
Unfortunately, at this point it is a well entrenched bug, and changing the behaviour will undoubtedly break programs. ... There should be another system for getting the exact bytes in and out (as Word8s, say, rather than Chars), and there are in fact external libraries using lower level interfaces, rather than the things like putStr, getLine, etc. that do this. An external library works, of course, but it should be part of the standard so implementors know that character based routines actually are character based, not byte based. ... I don't know what NHC and hugs do, though I assume they also provide no translations. I'm also not sure what JHC does, though I do see mentions of UTF-8, UTF-16 (for windows), and UTF-32 (for internal usage of C libraries), and I do know that John is fairly careful about locale issues.
I'm pretty sure Hugs does the right thing. NHC is probably broken. In any case, we already have hGetBuf / hPutBuf in the standard base libaries for raw binary IO, so code that uses getChar for bytes really has no excuse. We can and should fix the bug. Stefan