
On Thu, 2002-07-25 at 19:07, Andrew J Bromage wrote:
G'day all.
On Fri, Jul 26, 2002 at 01:27:48AM +0000, Karen Y wrote:
1. How would I convert capital letters into small letters? 2. How would I remove vowels from a string?
As you've probably found out, these are very hard problems.
Glossing over that concern, current implementations don't support the relevant UnicodePrims fully, so to do it properly you'll probably need to parse the case folding files yourself. See:
http://www.unicode.org/unicode/reports/tr21/
Vowels are even harder because I don't think the Unicode standard even defines what a "vowel" is. Removing vowel _marks_ should be straightforward once you expand combining characters, but that doesn't help with the general case. Frankly, I don't like your chances.
Shouldn't the solution also take care of languages without upper casing? Clearly the translation problem is easy enough with such languages ( "id" will work just fine), but determining (from context?) that the string is in such a language is more than a bit difficult (especially given that numeric codes can correspond to most everything). Vowels are much more difficult - even given that the language is recognizable, what would happen with languages such as Chinese or Arabic which (I believe) have nothing that even resembles a vowel? Of course, Chinese is a whole problem by itself. -- jeff putnam -- jefu.jefu@verizon.net -- http://home1.get.net/res0tm0p