Re: Unicode support in Hugs - alpha-patch available

23 Aug 2003

      On Sun, Aug 17, 2003 at 11:35:31PM -0400, Dimitry Golubovsky wrote:
...
Anyone interested in Unicode support in Hugs (what it lacks so far) 
please check out this URL:
http://www.golubovsky.org/software/hugs-patch/article.html
I have written a patch for the November 2002 release of Hugs that 
enables internal handling of Unicode characters by Hugs. The URL above 
points to the article I wrote to explain the details. The article also 
contains links to download the patch itself and the 
demonstration/testing program.
As a general comment: your patch converts the Unicode Database into an
internal table in Hugs for use by primitives.  An alternative approach is
used by a recent addition of Unicode support to GHC: use the native wide
character functions iswupper(), towupper(), etc where these are available.

The current CVS version of Hugs also includes an optimization of the
whatis() code, which may clash with your changes.  However the speed
gains from that change are modest -- increased functionality may be
more important.
...
[The] number of distinct characters defined by the Unicode Database
(UnicodeData.txt available from www.unicode.org is 15100 for the most
recent version (4.0) with Unicode character values ranging from 0x0000
to 0x10FFFD.  So, position of a character in the Unicode character table
may be used by Hugs as internal character code.
UnicodeData.txt may contain that many character lines, but it includes
pairs of lines like

4E00;;Lo;0;L;;;;;N;;;;;
9FA5;;Lo;0;L;;;;;N;;;;;

which describes 20902 characters, and there are several more like this.
Actually the character space is fairly dense, at least up to FFFF.
The compression approach could be used for character property tables,
but not for internal representation of character codes.

Also, the consCharArray array, used to implement (c:), is of size
NUM_CHARS -- this could be rather large.  (Perhaps this could be
filled lazily?)

Re: Unicode support in Hugs - alpha-patch available

Ross Paterson