[Haskell-cafe] FFI binding -- different behaviour under compilation and interpretation.

14 Nov 2009

      I'm binding to `wcwidth` to determine the column widths of
  various Unicode characters. I noticed a lot of -- in fact all
  -- Chinese characters were being given widths of `-1` when of
  course they should have width `2`. This only showed up when I
  compiled my program though -- within GHCi, it never happened.

  Below my signature is a parred down example that demoes the
  bug. It tries to get the width of only one Chinese character.
  You can see it like this:

   :; ghc --make DemoFailure.hs -o demo && demo
    [1 of 1] Compiling Main             ( DemoFailure.hs, DemoFailure.o )
    Linking demo ...
    0x00005cff  -1  峿
   :; chmod ug+x DemoFailure.hs && DemoFailure.hs
    0x00005cff   2  峿

  Switching between safe/unsafe does not make any difference. This
  was run on a Macintosh.

--
Jason Dusek

#!/usr/bin/env runhaskell

{- DemoFailure.hs -}

{-# LANGUAGE ForeignFunctionInterface
  #-}

import Foreign.C
import Data.Char
import Text.Printf

import qualified  System.IO.UTF8 as UTF8

main                         =  do
  (sequence_ . fmap (UTF8.putStrLn . uncurry fmt)) widths
 where
  widths                     =  [ (c, wcwidth c) | c <- ['\x5cff'] ]
--widths                     =  [ (c, wcwidth c) | c <- [minBound..maxBound] ]
  fmt c cols                 =  printf "0x%08x  %2d  %s" (fromEnum c) cols rep
   where
    rep | ' ' == c           =  "\\SP"
        | isSpace c          =  '\\' : show (fromEnum c)
        | isPrint c          =  [c]
        | otherwise          =  (reverse . drop 1 . reverse . drop 1 . show) c

wcwidth                     ::  Char -> Int
wcwidth                      =  fromEnum . native . toEnum . fromEnum

foreign import ccall unsafe "wchar.h wcwidth" native :: CWchar -> CInt