On Sun, Aug 17, 2003 at 11:35:31PM -0400, Dimitry Golubovsky wrote:
Anyone interested in Unicode support in Hugs (what it lacks so far) please check out this URL:
http://www.golubovsky.org/software/hugs-patch/article.html
I have written a patch for the November 2002 release of Hugs that enables internal handling of Unicode characters by Hugs. The URL above points to the article I wrote to explain the details. The article also contains links to download the patch itself and the demonstration/testing program.
As a general comment: your patch converts the Unicode Database into an internal table in Hugs for use by primitives. An alternative approach is used by a recent addition of Unicode support to GHC: use the native wide character functions iswupper(), towupper(), etc where these are available. The current CVS version of Hugs also includes an optimization of the whatis() code, which may clash with your changes. However the speed gains from that change are modest -- increased functionality may be more important.
[The] number of distinct characters defined by the Unicode Database (UnicodeData.txt available from www.unicode.org is 15100 for the most recent version (4.0) with Unicode character values ranging from 0x0000 to 0x10FFFD. So, position of a character in the Unicode character table may be used by Hugs as internal character code.
UnicodeData.txt may contain that many character lines, but it includes
pairs of lines like
4E00;