How to write a pure String to String function in Haskell FFI to C++

Hi, I want to implement a function in C++ via Haskell FFI, which should have the (final) type of String -> String. Say, is it possible to re-implement the following function in C++ with the exact same signature? import Data.Char toUppers:: String -> String toUppers s = map toUpper senter code here In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String. The reason I want to do this is that I have the impression that marshaling is not easy with FFI. Maybe if I can fix the simplest case above (other than primitive types such as int), then I can do whatever data parsing I want on the C++ side, which should be easy, practically. The cost of parsing is negligible compared to the computation that I want to do between the marshalling to/from strings. Thanks in advance.

On Sun, Jun 2, 2013 at 7:22 PM, Ting Lei
In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing
Anything that comes into or goes out of a Haskell program is in IO, period. If you have an FFI function which is guaranteed to not change anything but its parameters and those only in a pure way, then you can use unsafeLocalState to "hide" the IO; but claiming that when it's not true can lead to problems ranging from incorrect results to core dumps, so don't try to lie about it.
a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String.
Haskell String-s are *not* C strings. Not even slightly. C cannot work with Haskell's String type directly at all. Some kind of marshaling is absolutely necessary; there are functions in Foreign.Marshal.String that will marshal Haskell String-s to and from C strings. (String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshaling.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 2 Jun 2013, at 16:48, Brandon Allbery
On Sun, Jun 2, 2013 at 7:22 PM, Ting Lei
wrote: In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing Anything that comes into or goes out of a Haskell program is in IO, period. If you have an FFI function which is guaranteed to not change anything but its parameters and those only in a pure way, then you can use unsafeLocalState to "hide" the IO; but claiming that when it's not true can lead to problems ranging from incorrect results to core dumps, so don't try to lie about it.
a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String.
Haskell String-s are *not* C strings. Not even slightly. C cannot work with Haskell's String type directly at all. Some kind of marshaling is absolutely necessary; there are functions in Foreign.Marshal.String that will marshal Haskell String-s to and from C strings.
(String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshalling.)
I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time. Thanks Tom Davie

On Sun, Jun 2, 2013 at 8:01 PM, Thomas Davie
On 2 Jun 2013, at 16:48, Brandon Allbery
wrote: (String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshalling.)
I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time.
I think you underestimate the complexity of the Haskell runtime and the interactions between it and the FFI. Admittedly it is probably not "obvious" in the sense of "anyone can tell without knowing anything about it that it can't possibly work", but it should be at least somewhat obvious to someone who sees why there needs to be an FFI in the first place that the situation is not trivial, and that they probably should not blindly assume that the only reason one can't just pass Haskell values directly to C is that some GHC developer was feeling lazy at the time. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Thanks for your answers so far.
It seems that the laziness of String or [char] is the problem.
My question boils then down to this. There are plenty of Haskell FFI
examples where simple things like sin/cos in
On Sun, Jun 2, 2013 at 8:01 PM, Thomas Davie
wrote: On 2 Jun 2013, at 16:48, Brandon Allbery
wrote: (String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshalling.)
I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time.
I think you underestimate the complexity of the Haskell runtime and the interactions between it and the FFI. Admittedly it is probably not "obvious" in the sense of "anyone can tell without knowing anything about it that it can't possibly work", but it should be at least somewhat obvious to someone who sees why there needs to be an FFI in the first place that the situation is not trivial, and that they probably should not blindly assume that the only reason one can't just pass Haskell values directly to C is that some GHC developer was feeling lazy at the time.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

The C++/C function (e.g. toUppers) is computation-only and as pure as cos and tan. The fact that marshaling string incurs an IO monad in current examples is kind of unintuitive and like a bug in design. I don't mind making redundant copies under the hood from one type to another..
If you can guarantee that the call is pure, then you can execute it directly using `unsafePerformIO`. Simply call the external function as usual, then invoke `unsafePerformIO` on the result. See http://hackage.haskell.org/packages/archive/base/4.6.0.1/doc/html/System-IO-.... On another note, if you really care about performance, you should use the `bytestring` and `text` packages instead of String. They are implemented in terms of byte arrays, instead of linked lists, hence are both faster and more FFI-friendly.
On Sun, Jun 2, 2013 at 8:08 PM, Brandon Allbery
wrote: On Sun, Jun 2, 2013 at 8:01 PM, Thomas Davie
wrote: On 2 Jun 2013, at 16:48, Brandon Allbery
wrote: (String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshalling.)
I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time.
I think you underestimate the complexity of the Haskell runtime and the interactions between it and the FFI. Admittedly it is probably not "obvious" in the sense of "anyone can tell without knowing anything about it that it can't possibly work", but it should be at least somewhat obvious to someone who sees why there needs to be an FFI in the first place that the situation is not trivial, and that they probably should not blindly assume that the only reason one can't just pass Haskell values directly to C is that some GHC developer was feeling lazy at the time.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Chris Wong, fixpoint conjurer e: lambda.fairy@gmail.com w: http://lfairy.github.io/

as the others have said, if you want to have text data go between ghc and
c++, please use Text or Bytestring,
String... would get weird.
If you seriously want to experiment with writing low level code
manipulating the String type, it *MIGHT* be possible using the GHC C minus
minus (CMM). This would be very very very subtle to do correctly, and also
just be really really complicated and hard.
Likewise, for writing a "pure" looking ffi function, a good example is in
the lz4hs lib, where all the allocation occurs on the haskell side, and the
ffi is only mutating freshly allocated memory. Subject to this,
unsafePerformIO can be safely used to give a safe pure thread safe api.
cheers
-Carter
On Sun, Jun 2, 2013 at 10:55 PM, Chris Wong wrote: The C++/C function (e.g. toUppers) is computation-only and as pure as cos
and tan. The fact that marshaling string incurs an IO monad in current
examples is kind of unintuitive and like a bug in design. I don't mind
making redundant copies under the hood from one type to another.. If you can guarantee that the call is pure, then you can execute it
directly using `unsafePerformIO`. Simply call the external function as
usual, then invoke `unsafePerformIO` on the result. See <
http://hackage.haskell.org/packages/archive/base/4.6.0.1/doc/html/System-IO-... . On another note, if you really care about performance, you should use
the `bytestring` and `text` packages instead of String. They are
implemented in terms of byte arrays, instead of linked lists, hence
are both faster and more FFI-friendly. On Sun, Jun 2, 2013 at 8:08 PM, Brandon Allbery On Sun, Jun 2, 2013 at 8:01 PM, Thomas Davie wrote: On 2 Jun 2013, at 16:48, Brandon Allbery (String is a linked list of Char, which is also not a C char; it is a
constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an
unevaluated thunk which requires forcing the evaluation of arbitrary
Haskell
code elsewhere to "reify" the value; this obviously cannot be done in middle of random C code, so it must be done during marshalling.) I'm not convinced that that's "obvious" – though it certainly requires
functions (that go through the FFI) to grab each character at a time. I think you underestimate the complexity of the Haskell runtime and the
interactions between it and the FFI. Admittedly it is probably not
"obvious"
in the sense of "anyone can tell without knowing anything about it that
it
can't possibly work", but it should be at least somewhat obvious to
someone
who sees why there needs to be an FFI in the first place that the
situation
is not trivial, and that they probably should not blindly assume that wrote:
the
the only reason one can't just pass Haskell values directly to C is that
some
GHC developer was feeling lazy at the time. --
brandon s allbery kf8nh sine nomine
associates
allbery.b@gmail.com
ballbery@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad
http://sinenomine.net _______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe --
Chris Wong, fixpoint conjurer
e: lambda.fairy@gmail.com
w: http://lfairy.github.io/ _______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sun, Jun 2, 2013 at 10:19 PM, Ting Lei
Thanks for your answers so far.
It seems that the laziness of String or [char] is the problem.
My question boils then down to this. There are plenty of Haskell FFI examples where simple things like sin/cos in
can be imported into Haskell as pure functions. Is there a way to extend that to String without introducing an IO (), but maybe sacrificing laziness? If String has to be lazy, is there another Haskell data type convertible to String that can do the job? The C++/C function (e.g. toUppers) is computation-only and as pure as cos and tan. The fact that marshaling string incurs an IO monad in current examples is kind of unintuitive and like a bug in design. I don't mind making redundant copies under the hood from one type to another..
Hi Ting, In the Foreign.C.String there is a function that converts String to an array (CString = Ptr CChar) which can be handled on the C side: withCString :: String -> (CString -> IO a) -> IO a peekCString :: CString -> IO String It's slightly more convenient to use these functions through the preprocessor c2hs, as in the following example http://code.haskell.org/~aavogt/c_toUpper_ffi_ex/. c2hs also has a 'pure' keyword which makes it add the unsafePerformIO, but for whatever reason the side-effects were not done in the right order (the peekCString happened before the foreign function was called). Regards, Adam
participants (6)
-
adam vogt
-
Brandon Allbery
-
Carter Schonwald
-
Chris Wong
-
Thomas Davie
-
Ting Lei