
I'm binding to `wcwidth` to determine the column widths of various Unicode characters. I noticed a lot of -- in fact all -- Chinese characters were being given widths of `-1` when of course they should have width `2`. This only showed up when I compiled my program though -- within GHCi, it never happened. Below my signature is a parred down example that demoes the bug. It tries to get the width of only one Chinese character. You can see it like this: :; ghc --make DemoFailure.hs -o demo && demo [1 of 1] Compiling Main ( DemoFailure.hs, DemoFailure.o ) Linking demo ... 0x00005cff -1 峿 :; chmod ug+x DemoFailure.hs && DemoFailure.hs 0x00005cff 2 峿 Switching between safe/unsafe does not make any difference. This was run on a Macintosh. -- Jason Dusek #!/usr/bin/env runhaskell {- DemoFailure.hs -} {-# LANGUAGE ForeignFunctionInterface #-} import Foreign.C import Data.Char import Text.Printf import qualified System.IO.UTF8 as UTF8 main = do (sequence_ . fmap (UTF8.putStrLn . uncurry fmt)) widths where widths = [ (c, wcwidth c) | c <- ['\x5cff'] ] --widths = [ (c, wcwidth c) | c <- [minBound..maxBound] ] fmt c cols = printf "0x%08x %2d %s" (fromEnum c) cols rep where rep | ' ' == c = "\\SP" | isSpace c = '\\' : show (fromEnum c) | isPrint c = [c] | otherwise = (reverse . drop 1 . reverse . drop 1 . show) c wcwidth :: Char -> Int wcwidth = fromEnum . native . toEnum . fromEnum foreign import ccall unsafe "wchar.h wcwidth" native :: CWchar -> CInt