RE: [Haskell-i18n] unicode notation \uhhhh implementation

-----Original Message----- From: Martin Norbäck [mailto:d95mback@dtek.chalmers.se] Sent: 15 August 2002 14:40 To: haskell-i18n@haskell.org Subject: [Haskell-i18n] unicode notation \uhhhh implementation
Does anyone know the status of the implementation of unicode escape sequences \uhhhh as per 2.1 in the Haskell 98 standard?
When implemented, do the count as one or five characters.
Or, is UTF-8 (or locale specified encoding) to be used for Haskell source code? If yes, when?
I wasn't aware of that paragraph in the report until recently, and as far as I know none of the current Haskell implementations implement the '\uhhhh' escape sequences. One reason to use this approach would be if there already existed a preprocessor to do the job - does anyone know of one? If not, I think the paragraph could be deleted in favour of using appropriate encodings for source files (I'd planned to implement at least UTF-8 in GHC at some point). Cheers, Simon

I wasn't aware of that paragraph in the report until recently, and as far as I know none of the current Haskell implementations implement the '\uhhhh' escape sequences.
HBC implemented Unicode years ago. http://www.math.chalmers.se/~augustss/hbc/lexemes.html
One reason to use this approach would be if there already existed a preprocessor to do the job - does anyone know of one?
Can't be more than a few lines of Perl. It's quite short in Haskell too: convert :: String -> String convert ('\\':'u':c1:c2:c3:c4:cs) | isHex c1 && isHex c2 && isHex c3 && isHex c4 = chr (readHex [c1,c2,c3,c4]) : convert cs | otherwise -- not clear if this is = error "Malformed unicode sequence" -- allowed by the spec convert (c:cs) = c : convert cs convert [] = []
If not, I think the paragraph could be deleted in favour of using appropriate encodings for source files (I'd planned to implement at least UTF-8 in GHC at some point).
I think it's fine to support unicode input files as well but don't see any motivation not to implement the \uXXXX form as well. Indeed, we know that all machines that can support Haskell can handle ASCII but I'll bet there's plenty of systems where unicode-format files are awkward to manipulate. -- Alastair Reid alastair@reid-consulting-uk.ltd.uk Reid Consulting (UK) Limited http://www.reid-consulting-uk.ltd.uk/alastair/

From: "Simon Marlow"
I wasn't aware of that paragraph in the report until recently, and as far as I know none of the current Haskell implementations implement the '\uhhhh' escape sequences.
One reason to use this approach would be if there already existed a preprocessor to do the job - does anyone know of one? If not, I think the paragraph could be deleted in favour of using appropriate encodings for source files (I'd planned to implement at least UTF-8 in GHC at some point).
This means that internally GHC will treat or represent 'Char' as 32-bit integer ? -- Nobuo Yamashita mailto:nobusun@timedia.co.jp
participants (3)
-
Alastair Reid
-
Nobuo Yamashita
-
Simon Marlow