Marshalling Haskell String <-> UTF-8

I want to call a foreign C function that takes a UTF-8 encoded string as one of its arguments (and there's also a version of the function that receives UTF-16). Can someone point me to documentation or examples of how this would be done? AFAICT (reading the FFI spec) marshalling a String to a CString is locale-dependent, whereas I know that I want UTF-8/16. Also, if a C function returns a UTF-8 (or UTF-16) encoded string, how do I marshall this reliably into a Haskell String? Can I use the UTF-16 functions directly with CWStrings? (I'm not sure exactly what wchar_t is, as it's apparently dependent on the locale at compile-time, and could be 8, 16, or 32 bits). Thanks, Alistair. ----------------------------------------- ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote:
I want to call a foreign C function that takes a UTF-8 encoded string as one of its arguments (and there's also a version of the function that receives UTF-16). Can someone point me to documentation or examples of how this would be done? AFAICT (reading the FFI spec) marshalling a String to a CString is locale-dependent, whereas I know that I want UTF-8/16.
The locale-dependent marshalling of CString described by the FFI spec isn't yet implemented in the library. There is some code by John Meacham including UTF-8 conversion at http://www.haskell.org/pipermail/ffi/2003-August/001355.html
Can I use the UTF-16 functions directly with CWStrings? (I'm not sure exactly what wchar_t is, as it's apparently dependent on the locale at compile-time, and could be 8, 16, or 32 bits).
Under Windows, CWString uses the UTF-16 encoding. On systems that define __STDC_ISO_10646__ (e.g. glibc as used under Linux) it uses UTF-32. (This is in the CVS version that will become 6.4, not the current release.)

On Wed, Sep 01, 2004 at 11:13:23AM +0100, Ross Paterson wrote:
On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote:
I want to call a foreign C function that takes a UTF-8 encoded string as one of its arguments (and there's also a version of the function that receives UTF-16). Can someone point me to documentation or examples of how this would be done? AFAICT (reading the FFI spec) marshalling a String to a CString is locale-dependent, whereas I know that I want UTF-8/16.
The locale-dependent marshalling of CString described by the FFI spec isn't yet implemented in the library. There is some code by John Meacham including UTF-8 conversion at
http://www.haskell.org/pipermail/ffi/2003-August/001355.html
You could also look at the darcs source code, as darcs uses UTF8 to store file names. -- David Roundy http://www.abridgegame.org/darcs

On Wed, Sep 01, 2004 at 11:13:23AM +0100, Ross Paterson wrote:
On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote:
I want to call a foreign C function that takes a UTF-8 encoded string as one of its arguments (and there's also a version of the function that receives UTF-16). Can someone point me to documentation or examples of how this would be done? AFAICT (reading the FFI spec) marshalling a String to a CString is locale-dependent, whereas I know that I want UTF-8/16.
The locale-dependent marshalling of CString described by the FFI spec isn't yet implemented in the library. There is some code by John Meacham including UTF-8 conversion at
I should mention I have a new version of the CWString library in development that conforms to the new FFI spec and works on all posixy systems, not just those that have unicode wchar_t's like my first posting. It is not quite ready for release, but if there is a strong need I can package it up nicely. John -- John Meacham - ⑆repetae.net⑆john⑈

On Wed, Sep 01, 2004 at 04:39:30PM -0700, John Meacham wrote:
I should mention I have a new version of the CWString library in development that conforms to the new FFI spec and works on all posixy systems, not just those that have unicode wchar_t's like my first posting.
It is not quite ready for release, but if there is a strong need I can package it up nicely.
The most useful packaging would be as a patch against the HEAD version of fptools/libraries/base/Foreign/C/String.hs There may also be a difficulty in that you may require hsc2hs but I think Simon wants to keep it out of low-level modules (?).
participants (4)
-
Bayley, Alistair
-
David Roundy
-
John Meacham
-
Ross Paterson