New subject: Character predicates (was: Re: Hugs vs GHC (again))

10 Jan 2005

      -------- Original Message --------
Subject: Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random	newbiequestions
Date: Mon, 10 Jan 2005 20:47:26 -0500
From: Dimitry Golubovsky 
To: Marcin 'Qrczak' Kowalczyk 
References: <3429668D0E777A499EE74A7952C382D102F30BF6@EUR-MSG-01.europe.corp.microsoft.com>	<877jmmv2zz.fsf@qrnik.zagroda>	<59BACBDC-6273-11D9-8389-000A95E8B0DA@etu.upmc.fr> 
<87acrgssem.fsf@qrnik.zagroda>

Hi,

Let me add a column for Hugs (summarized by looking at recent checkout
from CVS, contained im several C ahd Haskell files):

           |Sebastien's| Marcin's | Hugs
    -------+-----------+----------+------
     alnum | L* N*     | L* N*    | L*, M*, N* <1>
     alpha | L*        | L*       | L* <1>
     cntrl | Cc        | Cc Zl Zp | c < ' ' || c >= '\DEL' && c <= '\x9f'
     digit | N*        | Nd       | c >= '0'   &&  c <= '9'
     lower | Ll        | Ll       | Ll <1>
     punct | P*        | P*       | P*
     upper | Lu        | Lt Lu    | Lu Lt <1>
     blank | Z* \t\n\r | Z*(except| ' ' \t\n\r\f\v U+00A0
                         U+00A0
                         U+2007
                         U+202F)
                         \t\n\v\f\r U+0085

<1>: for characters outside Latin1 range. For Latin1 characters (0 to
255), there is a lookup table defined as
"unsigned char   charTable[NUM_LAT1_CHARS];"

I also like Ketil's idea about defining predicates like isUpper or isSpace
in multiple files, quoting this:
...
...
It's not obvious what the predicates should really mean, e.g. should
isDigit and isHexDigit include non-ASCII digits or should isSpace
include non-breaking space characters.
...
I think perhaps the answer is all of the above.  The functions could
be defined in multiple modules, so that 'ASCII.isSpace' would match
the "normal" space character only, while 'Unicode.isSpace' could match
all the weird and wonderful stuff in the standard.
So there might be a bunch of (perhaps autogenerated, from localedef
files) modules for each locale/encoding, like ISO8859_1 or KOI_8. These
modules might be imported into applications as needed. Also there would
be one module autogenerated from the Unicode data files.

Dimitry Golubovsky
Middletown, CT

Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions

Dimitry Golubovsky

Marcin 'Qrczak' Kowalczyk

Ketil Malde

tags

participants (3)