Re: Haskell Platform Proposal: add the 'text' library

20 Oct 2010

      On Wed, Oct 20, 2010 at 09:57:04AM -0700, Bryan O'Sullivan wrote:
...
On Wed, Oct 20, 2010 at 9:52 AM, Johan Tibell wrote:
...
I think the right thing to do here is to perform normalization first but
I'm not sure.
Hi, friendly neighbourhood Unicode expert here. Yes, in the case Ian cites,
you want to perform normalization before doing the replacement. The
behaviour he demonstrates is normal, expected, and consistent with the
standard.
OK, so that works with the previous example:

Data.Text Data.Text.IO Data.Text.ICU> let t = pack "z\x0061\x030A\x0061z"
Data.Text Data.Text.IO Data.Text.ICU> t
"za\778az"
Data.Text Data.Text.IO Data.Text.ICU> putStrLn t
zåaz
Data.Text Data.Text.IO Data.Text.ICU> normalize NFC t
"z\229az"
Data.Text Data.Text.IO Data.Text.ICU> putStrLn (normalize NFC t)
zåaz
Data.Text Data.Text.IO Data.Text.ICU> putStrLn (replace (pack "a") (pack "y") (normalize NFC t))
zåyz

but only because now characters and codepoints are 1:1. If we were using
a character for which there is no code point, e.g. (the probably
non-existent, but I understand there are real examples) p-ring:

Data.Text Data.Text.IO Data.Text.ICU> let t = pack "zp\x030Apz"
Data.Text Data.Text.IO Data.Text.ICU> t
"zp\778pz"
Data.Text Data.Text.IO Data.Text.ICU> putStrLn t
zp̊pz
Data.Text Data.Text.IO Data.Text.ICU> normalize NFC t
"zp\778pz"
Data.Text Data.Text.IO Data.Text.ICU> putStrLn (normalize NFC t)
zp̊pz
Data.Text Data.Text.IO Data.Text.ICU> putStrLn (replace (pack "p") (pack "y") (normalize NFC t))
zẙyz

then it doesn't work.

Johan wrote:
...
If you process a string code point by code point you might mistakenly
confuse a plain "a" (A) with a "å" (A-RING *or* A + COMBINING RING).
But when characters and codepoints are 1:1, you /can/ process code point
by code point.

Am I missing something?

Thanks
Ian

Re: Haskell Platform Proposal: add the 'text' library

Ian Lynagh