
On 26 Aug 2008, at 1:31 pm, Deborah Goldsmith wrote:
You can't determine Unicode character properties by analyzing the names of the characters.
However, the OP *does* have a copy of the UnicodeData...txt file, and you *can* determine the relevant Unicode character properties from that. For example, consider the entry for space: 0020;SPACE;Zs;0;WS;;;;;N;;;;; ^^ The Zs bit says it's a white space character (Zs: separator/space, Zl: separator/line, Zp: separator/paragraph). Or look at capital A: 0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;^ ^^ The Lu bit says it's a L(etter) that is u(pper case). Upper case: Lu, lower case: Ll, title case: Lt, modifier letter: Lm, other letter: Lo, digit: Nd, ... If memory serves me correctly, this is explained in the UnicodeData.html file, under a heading something like Normative Categories.