non-ASCII filepaths in a C function

In my 'soxlib' package I have written a binding to sox_format_t * sox_open_read( char const * path, sox_signalinfo_t const * signal, sox_encodinginfo_t const * encoding, char const * filetype); I construct the C filepath "path" from a Haskell FilePath using Foreign.C.String.withCString. This works for ASCII and non-ASCII characters in Linux. However, non-ASCII characters let sox_open_read fail on Windows. What is the correct way to convert FilePath to "char *"?

I believe the native representation for FilePaths on Windows should be UTF16 strings. Regards, Malcolm
On 24 Jul 2015, at 22:52, Henning Thielemann
wrote: In my 'soxlib' package I have written a binding to
sox_format_t * sox_open_read( char const * path, sox_signalinfo_t const * signal, sox_encodinginfo_t const * encoding, char const * filetype);
I construct the C filepath "path" from a Haskell FilePath using Foreign.C.String.withCString. This works for ASCII and non-ASCII characters in Linux. However, non-ASCII characters let sox_open_read fail on Windows. What is the correct way to convert FilePath to "char *"? _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Hi,
The native representation for filepaths on Linux is char[] (i.e. raw
bytes). withCString converts from String to char[] using the current
locale, which doesn't always work (at least, it doesn't always do what you
want). As long as everything is in the same locale, ideally UTF-8, then
you'll be fine, but it's legitimate to have a file whose name is not legal
UTF-8 even in a UTF-8 locale, and these will cause you problems.
(Minor, nitpicky bugbear: the native representation for filepaths on
Windows is wchar_t[] which is interpreted as UTF-16 *where possible*, but
there are also some legal filenames (e.g. "C:\\Temp\\\xd800") which are
invalid as UTF-16)
I'm not familiar with soxlib specifically, but for opening a file on
Windows named as a char[] I'm going to guess that the library ultimately
ends up calling a so-called ANSI version of a function like CreateFileA,
which accepts a char[] and converts it to wchar_t[] within the OS according
to the current code page. withCString seems to look at the current code
page when converting a String to a char[] too, but clearly something's not
matching for you.
So a few things to check:
- does soxlib use the ANSI version, CreateFileA or similar?
- what code page does it think it's in?
- can you convert the troublesome filename to bytes in this code page by
hand, and compare with what withCString is doing?
- can you convert these bytes to wchar_t[] using MultiByteToWideChar in the
current code page? Does this look like what you expect?
Unfortunately there's no complete general solution to this problem that
fits through an API that only uses char[] for filenames - the mapping from
filenames written as char[] to Windows filenames is never surjective. The
best solution would be for soxlib to offer an API that accepted wchar_t[]
filenames on Windows, although I appreciate this might not be reasonable!
Hopefully this helps a bit.
On 25 July 2015 at 08:40, Malcolm Wallace
I believe the native representation for FilePaths on Windows should be UTF16 strings.
Regards, Malcolm
On 24 Jul 2015, at 22:52, Henning Thielemann < lemming@henning-thielemann.de> wrote:
In my 'soxlib' package I have written a binding to
sox_format_t * sox_open_read( char const * path, sox_signalinfo_t const * signal, sox_encodinginfo_t const * encoding, char const * filetype);
I construct the C filepath "path" from a Haskell FilePath using Foreign.C.String.withCString. This works for ASCII and non-ASCII characters in Linux. However, non-ASCII characters let sox_open_read fail on Windows. What is the correct way to convert FilePath to "char *"? _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
participants (3)
-
David Turner
-
Henning Thielemann
-
Malcolm Wallace