RE: [Haskell-i18n] unicode notation \uhhhh implementation

16 Aug 2002


      ...
...
I wasn't aware of that paragraph in the report until recently, and
as far as I know none of the current Haskell implementations
implement the '\uhhhh' escape sequences.
HBC implemented Unicode years ago.
http://www.math.chalmers.se/~augustss/hbc/lexemes.html
No, HBC doesn't implement the paragraph of the report that we're talking about.  HBC allows the '\uhhhh' escape sequence in characters and string literals, but not in identifiers and other parts of the source.

Also, it's not clear to me why you need '\uhhhh' escape sequence in character and string literals at all, since it appears to mean the same thing as '\xhhhh' (the report isn't clear that '\xhhhh' means a "unicode code point", but that seems to be the only reasonable interpretation).
...
One reason to use this approach would be if there already existed a
preprocessor to do the job - does anyone know of one?
...
Can't be more than a few lines of Perl.  It's quite short in Haskell too:
convert :: String -> String
  convert ('\\':'u':c1:c2:c3:c4:cs) 
    | isHex c1 && isHex c2 && isHex c3 && isHex c4 
    = chr (readHex [c1,c2,c3,c4]) : convert cs
    | otherwise                              -- not clear if this is 
    = error "Malformed unicode sequence"     -- allowed by the spec
  convert (c:cs) = c : convert cs
  convert [] = []
I meant a preprocessor to take source code in some random encoding and convert it into ASCII with '\uhhhh' escape sequences.  If there was such a thing, then we could all use it and save re-implementing N different encodings in each compiler.

Cheers,
	Simon

RE: [Haskell-i18n] unicode notation \uhhhh implementation

Simon Marlow