
I wasn't aware of that paragraph in the report until recently, and as far as I know none of the current Haskell implementations implement the '\uhhhh' escape sequences.
HBC implemented Unicode years ago. http://www.math.chalmers.se/~augustss/hbc/lexemes.html
One reason to use this approach would be if there already existed a preprocessor to do the job - does anyone know of one?
Can't be more than a few lines of Perl. It's quite short in Haskell too: convert :: String -> String convert ('\\':'u':c1:c2:c3:c4:cs) | isHex c1 && isHex c2 && isHex c3 && isHex c4 = chr (readHex [c1,c2,c3,c4]) : convert cs | otherwise -- not clear if this is = error "Malformed unicode sequence" -- allowed by the spec convert (c:cs) = c : convert cs convert [] = []
If not, I think the paragraph could be deleted in favour of using appropriate encodings for source files (I'd planned to implement at least UTF-8 in GHC at some point).
I think it's fine to support unicode input files as well but don't see any motivation not to implement the \uXXXX form as well. Indeed, we know that all machines that can support Haskell can handle ASCII but I'll bet there's plenty of systems where unicode-format files are awkward to manipulate. -- Alastair Reid alastair@reid-consulting-uk.ltd.uk Reid Consulting (UK) Limited http://www.reid-consulting-uk.ltd.uk/alastair/