Quickest way to pass Text to C code

Hello, I have to interact with a C++ library that accepts as string types (putting c++ strings aside) pointers of wchar_t (CWString in Haskell) or unsigned 32-bit int (Ptr Word32 for UTF-32 codepoints). I have read what text, bytestring and base provide, but Text can only be directly converted to (Ptr Word16), and if I use encodeUTF32 to get a ByteString, then I only get useAsCString, no direct conversion to CWString or Ptr WordXX is possible. Not to mention the extra memory allocations due to intermediate conversions. base provides Foreign.C.String.useAsCWString, but it requires that either I use simple Strings at the first place or (same thing than before) I convert from Text to String before passing to C. Is there something I'm missing or isn't this kind of conversion that easy?

On Mar 21, 2012, at 4:35 AM, Yves Parès wrote:
Hello,
I have to interact with a C++ library that accepts as string types (putting c++ strings aside) pointers of wchar_t (CWString in Haskell) or unsigned 32-bit int (Ptr Word32 for UTF-32 codepoints).
The vector package has "storable" vectors, which are essentially raw C arrays. It provides the function: Data.Vector.Storable.unsafeWith :: Storable a => Vector a -> (Ptr a -> IO b) -> IO b This is probably the simplest way to do what you're describing. You can also manually allocate and poke data into raw memory using Foreign.Marshall.Alloc and Foreign.Storable, if you're feeling particularly masochistic ;) -- James

You can also manually allocate and poke data into raw memory using Foreign.Marshall.Alloc and Foreign.Storable, if you're feeling particularly masochistic ;) That's kind of what I did by the past (Aggregate Word8 into a single Word32), before I discovered Text for fast string handling.
I know about storable Vectors (and already use them, but not for text), but
I would loose Haskell-side the functionnalities of Text (I'm handling
textual data in the first place, not raw bytes).
Text already provide all string handling/file reading functions.
Or else you'd have a convenient way to convert Text into Vector?
Le 21 mars 2012 12:35, James Cook
On Mar 21, 2012, at 4:35 AM, Yves Parès wrote:
Hello,
I have to interact with a C++ library that accepts as string types (putting c++ strings aside) pointers of wchar_t (CWString in Haskell) or unsigned 32-bit int (Ptr Word32 for UTF-32 codepoints).
The vector package has "storable" vectors, which are essentially raw C arrays. It provides the function:
Data.Vector.Storable.unsafeWith :: Storable a => Vector a -> (Ptr a -> IO b) -> IO b
This is probably the simplest way to do what you're describing. You can also manually allocate and poke data into raw memory using Foreign.Marshall.Alloc and Foreign.Storable, if you're feeling particularly masochistic ;)
-- James

On Wed, Mar 21, 2012 at 3:35 AM, Yves Parès
Hello,
I have to interact with a C++ library that accepts as string types (putting c++ strings aside) pointers of wchar_t (CWString in Haskell) or unsigned 32-bit int (Ptr Word32 for UTF-32 codepoints).
I have read what text, bytestring and base provide, but Text can only be directly converted to (Ptr Word16), and if I use encodeUTF32 to get a ByteString, then I only get useAsCString, no direct conversion to CWString or Ptr WordXX is possible.
A CString is a (Ptr CChar). You can then use castPtr to get whichever pointer type you need, if you believe the underlying buffer has the representation you want (in this case, UTF-32). It still won't be null-terminated, however. Antoine

Okay, eventually it boils down to this:
import Data.Text
import Data.Text.Encoding (encodeUtf32LE)
import Data.ByteString.Unsafe (unsafeUseAsCString)
textAsPtrW32 :: Text -> (Ptr Word32 -> IO a) -> IO a
textAsPtrW32 t = unsafeUseAsCString (encodeUtf32LE $ t `snoc` '\0') . (.
castPtr)
As the function passed copies or at least does not store the pointer, I can
use unsafeUseAsCString, but then I have to manually append the
null-termination.
Le 21 mars 2012 13:09, Antoine Latter
On Wed, Mar 21, 2012 at 3:35 AM, Yves Parès
wrote: Hello,
I have to interact with a C++ library that accepts as string types (putting c++ strings aside) pointers of wchar_t (CWString in Haskell) or unsigned 32-bit int (Ptr Word32 for UTF-32 codepoints).
I have read what text, bytestring and base provide, but Text can only be directly converted to (Ptr Word16), and if I use encodeUTF32 to get a ByteString, then I only get useAsCString, no direct conversion to CWString or Ptr WordXX is possible.
A CString is a (Ptr CChar). You can then use castPtr to get whichever pointer type you need, if you believe the underlying buffer has the representation you want (in this case, UTF-32).
It still won't be null-terminated, however.
Antoine
participants (3)
-
Antoine Latter
-
James Cook
-
Yves Parès