
On 02/11/2011 21:40, Max Bolingbroke wrote:
On 2 November 2011 20:16, Ian Lynagh
wrote: Are you saying there's a bug that should be fixed?
You can choose between two options:
1. Failing to roundtrip some strings (in our case, those containing the 0xEFNN byte sequences) 2. Having GHC's decoding functions return strings including codepoints that should not be allowed (i.e. lone surrogates)
At the time I implemented this there was significant support for 2, so that is what we have.
Don't you mean 1 is what we have?
At the time I was convinced that 2 was the right thing to do, but now I'm more agnostic. But anyway the current behaviour is not really a bug -- it is by design :-)
Failing to roundtrip in some cases, and doing so silently, seems highly suboptimal to me. I'm sorry I didn't pick up on this at the time (Unicode is a swamp :). Cheers, Simon