RE: Unicode in GHC: need some advice on building

On 11 January 2005 02:29, Dimitry Golubovsky wrote:
Bad thing is, LD_PRELOAD does not work on all systems. So I tried to put the code directly into the runtime (where I believe it should be; the Unicode properties table is packed, and won't eat much space). I renamed foreign function names in GHC.Unicode (to avoid conflict with libc functions) adding u_ to them (so now they are u_iswupper, etc). I placed the new file into ghc/rts, and the include file into ghc/includes. I could not avoid messages about missing prototypes for u_... functions , but finally I was able to build ghc. Now when I compiled my test program with the rebuilt ghc, it worked without the LD_PRELOADed library. However, GHCi could not start complaining that it could not see these u_... symbols. I noticed some other entry points into the runtime like revertCAFs, or getAllocations, declared in the Haskell part of GHCi just as other foreign calls, so I just followed the same style - partly unsuccessfully.
Where am I wrong?
You're doing fine - but a better place for the tables is as part of the base package, rather than the RTS. We already have some C files in the base package: see libraries/base/cbits, for example. I suggest just putting your code in there. Cheers, Simon

Hi, Simon Marlow wrote:
You're doing fine - but a better place for the tables is as part of the base package, rather than the RTS. We already have some C files in the base package: see libraries/base/cbits, for example. I suggest just putting your code in there.
I have done that - now GHCi recognizes those symbols and loads fine. The test program also works when compiled. I still got some messages about missing prototypes and implicitly declared functions that I defined instead of libc functions, especially during Stage 1. I need to check into that, but since all those functions are basically int -> int, it does not affect the result. The code I use is some draft code, based on what I submitted for Hugs (pure Unicode basically, even without extra space characters). Now I need more advice on which "flavor" of Unicode support to implement. In Haskell-cafe, there were 3 flavors summarized: I am reposting the table here (its latest version). |Sebastien's| Marcin's | Hugs -------+-----------+----------+------ alnum | L* N* | L* N* | L*, M*, N* <1> alpha | L* | L* | L* <1> cntrl | Cc | Cc Zl Zp | Cc digit | N* | Nd | '0'..'9' lower | Ll | Ll | Ll <1> punct | P* | P* | P* upper | Lu | Lt Lu | Lu Lt <1> blank | Z* \t\n\r | Z*(except| ' ' \t\n\r\f\v U+00A0 U+00A0 U+2007 U+202F) \t\n\v\f\r U+0085 <1>: for characters outside Latin1 range. For Latin1 characters (0 to 255), there is a lookup table defined as "unsigned char charTable[NUM_LAT1_CHARS];" I did not post the contents of the table Hugs uses for the Latin1 part. However, with that table completely removed, Hugs did not work properly. So its contents somehow differs from what Unicode defines for that character range. If needed, I may decode that table and post its mapping of character categories (keeping in mind that those are Haskell-recognized character categories, not Unicode) I am not asking for discussion in this list again. I rather expect some suggestion from the GHC team leads, which flavor (of shown above, or some combination of the above) to implement. One more question that I had when experimenting with Hugs: if a character (like those extra blank chars) is forced into some category for the purposes of Haskell language compilation (per the Report), does this mean that any other Haskell application should recognize Haskell-defined category of that character rather than Unicode-defined? For Hugs, there were no choice but say Yes, because both compiler and interpreter used the same code to decide on character category. In GHC this may be different. Since Hugs got there first, does it make sense just follow what was done here, or will a different decision be adopted for GHC: say, for the Parser, extra characters are forced to be blank, but for the rest of the programs compiled by GHC, Unicode definitions are adhered to. PS The latest rebuild I did, used ghc with new code compiled in as Stage 1 compiler. Dimitry Golubovsky Middletown, CT
participants (2)
-
Dimitry Golubovsky
-
Simon Marlow