
Sorry, Simon, are you received this message?
This is a forwarded message
From: Bulat Ziganshin
it is true what to support unicode source files only StringBuffer implementation must be changed?
SM> It depends whether you want to support several different encodings, or SM> just UTF-8. If we only want to support UTF-8, then we can keep the SM> StringBuffer in UTF-8 and also FastStrings. (or you could re-encode the SM> other encodings into UTF-8). srcParseErr contains call to "stepOnBy (-len)", and doing this will be hard with UTF-8. although we can save pointer(s) to positions of previous chars or even just reparse from scratch entire buffer - printing source errors is not so frequent task. of course, it will be great to just save this position for us :) making FastString UTF-8-enabled would be great. it needs changes in lengthFS, indexFS and may be cmpFS (can the UTF-8 chars be compared with just memcmp?). also i don't know about hPutFS. win32 console works in either oem or ansi 8-bit encoding SM> The question is what Alex should see for a unicode character: Alex SM> currently assumes that characters are in the range 0-255 (you need a SM> fixed range in order to generate the lexer tables). One possibility is SM> to map all Unicode upper-case characters to a single character code for SM> Alex, and similarly for the other classes of character. i don't know anything about Alex intrinsics, and can only say that any solution is better to do INSIDE Alex, so other programs using it will also get Unicode support ... if this problem is just about changing charType in Ctype.lhs - we can use some sort of hack. for example, use current scheme until there is some char greater than (chr 255). in this moment we create array for classification of chars 256-65535. all chars greater than (chr 65535) is better to recognize with calls to appropriate functions, i think btw, Ruby supports writing numbers in form 1_200_000. how about adding this feature to GHC? ;) Lexer.x: @decimal = $digit [$digit \_]* @octal = [$octit \_]+ @hexadecimal = [$hexit \_]+ StringBuffer.lhs: parseInteger :: StringBuffer -> Int -> Integer -> (Char->Int) -> Integer parseInteger buf len radix to_int = go 0 0 where go i x | i == len = x | otherwise = case (lookAhead buf i) of '_' -> go (i+1) x c -> go (i+1) (x * radix + toInteger (to_int c)) -- Best regards, Bulat mailto:bulatz@HotPOP.com ===8<===========End of original message text=========== -- Best regards, Bulat mailto:bulatz@HotPOP.com