
Bayley, Alistair wrote:
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Stefan O'Rear
fromUTF8Ptr unboxes fine for me with HEAD and 6.6.1.
- the chr function tests that its Int argument is less than 1114111, before constructing the Char. It'd be nice to avoid this test. You want unsafeChr from the (undocumented) GHC.Base module. http://darcs.haskell.org/ghc-6.6/packages/base/GHC/Base.lhs for reference (but don't copy the file, it's already an importable module).
FWIW,
I've optimised this to a point where I'm happy with it, and you can see the results here: http://darcs.haskell.org/takusen/Foreign/C/UTF8.hs
In that code you have: | x <= 0x0010FFFF -- should be 0x001FFFFF I wasn't aware that the largest unicode code point had changed. Do you have a reference? Should we change the range of Char in GHC? Cheers, Simon