
You can't determine Unicode character properties by analyzing the names of the characters. Read chapter 4 of the standard: http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf and get the property values here: http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt It sounds like the properties you want are "Case" and "General Category". Maybe the spec should be more explicit on exactly how the definitions map onto Unicode properties, so there is no ambiguity. Deborah On Aug 25, 2008, at 6:15 PM, MaurĂ cio wrote:
Hi,
In Haskell reference, I see the following definitions:
uniWhite -> any Unicode character defined as whitespace;
uniSmall -> any Unicode lowercase letter;
uniLarge -> any uppercase or titlecase Unicode letter;
uniSymbol -> any Unicode symbol or punctuation.
Where do I get lists for those characters? My first attempt was to check:
http://unicode.org/Public/UNIDATA/UnicodeData.txt
and consider large anything marked as CAPITAL and small anything marked as SMALL. I didn't know what to guess about the symbols. Am I using the right reference? How can I recognize (or get a list of) valid uppercase and lowercase unicode letters, as well as symbols and punctuation?
Thanks for your help, MaurĂcio
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe