
On 10/31/12 11:49 PM, Patrick Palka wrote:
On Wed, Oct 31, 2012 at 10:39 PM, wren ng thornton
wrote: The one thing I worry about using \x1680 as the threshold[1] is that I'm not sure whether every character below \x1680 has been allocated or whether some are still free. If any of them are free, then this will become incorrect in subsequent versions of Unicode so it's a maintenance timebomb. (Whereas if they're all specified then it should be fine.) Can someone verify that using \x1680 is sound in this manner?
According to GHCi:
Prelude Data.Char> length $ filter ((== NotAssigned) . generalCategory)
['\0'..'\x1680'] 830
Guess I never looked closely at what Unicode queries Data.Char offers... Looks like the first unassigned character is '\888' -- Live well, ~wren