
I wasn't aware of that paragraph in the report until recently, and as far as I know none of the current Haskell implementations implement the '\uhhhh' escape sequences.
HBC implemented Unicode years ago.
No, HBC doesn't implement the paragraph of the report that we're talking about. HBC allows the '\uhhhh' escape sequence in characters and string literals, but not in identifiers and other parts of the source. Also, it's not clear to me why you need '\uhhhh' escape sequence in character and string literals at all, since it appears to mean the same thing as '\xhhhh' (the report isn't clear that '\xhhhh' means a "unicode code point", but that seems to be the only reasonable interpretation).
One reason to use this approach would be if there already existed a preprocessor to do the job - does anyone know of one?
Can't be more than a few lines of Perl. It's quite short in Haskell too:
convert :: String -> String convert ('\\':'u':c1:c2:c3:c4:cs) | isHex c1 && isHex c2 && isHex c3 && isHex c4 = chr (readHex [c1,c2,c3,c4]) : convert cs | otherwise -- not clear if this is = error "Malformed unicode sequence" -- allowed by the spec convert (c:cs) = c : convert cs convert [] = []
I meant a preprocessor to take source code in some random encoding and convert it into ASCII with '\uhhhh' escape sequences. If there was such a thing, then we could all use it and save re-implementing N different encodings in each compiler. Cheers, Simon