How to write a pure String to String function in Haskell FFI to C++

older
Frankfurt Haskell User Group meets...

Ting Lei

3 Jun 2013 3 Jun '13

4:52 a.m.

Hi, I want to implement a function in C++ via Haskell FFI, which should have the (final) type of String -> String. Say, is it possible to re-implement the following function in C++ with the exact same signature? import Data.Char toUppers:: String -> String toUppers s = map toUpper senter code here In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String. The reason I want to do this is that I have the impression that marshaling is not easy with FFI. Maybe if I can fix the simplest case above (other than primitive types such as int), then I can do whatever data parsing I want on the C++ side, which should be easy, practically. The cost of parsing is negligible compared to the computation that I want to do between the marshalling to/from strings. Thanks in advance.

Attachments:

attachment.html (text/html — 1.3 KB)

Show replies by date

Brandon Allbery

3 Jun 3 Jun

5:18 a.m.

New subject: How to write a pure String to String function in Haskell FFI to C++

On Sun, Jun 2, 2013 at 7:22 PM, Ting Lei wrote:

...

In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing

Anything that comes into or goes out of a Haskell program is in IO, period. If you have an FFI function which is guaranteed to not change anything but its parameters and those only in a pure way, then you can use unsafeLocalState to "hide" the IO; but claiming that when it's not true can lead to problems ranging from incorrect results to core dumps, so don't try to lie about it.

...

a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String.

Haskell String-s are *not* C strings. Not even slightly. C cannot work with Haskell's String type directly at all. Some kind of marshaling is absolutely necessary; there are functions in Foreign.Marshal.String that will marshal Haskell String-s to and from C strings. (String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshaling.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Thomas Davie

5:31 a.m.

On 2 Jun 2013, at 16:48, Brandon Allbery wrote:

...

On Sun, Jun 2, 2013 at 7:22 PM, Ting Lei wrote: In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing

Anything that comes into or goes out of a Haskell program is in IO, period. If you have an FFI function which is guaranteed to not change anything but its parameters and those only in a pure way, then you can use unsafeLocalState to "hide" the IO; but claiming that when it's not true can lead to problems ranging from incorrect results to core dumps, so don't try to lie about it.

a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String.

Haskell String-s are *not* C strings. Not even slightly. C cannot work with Haskell's String type directly at all. Some kind of marshaling is absolutely necessary; there are functions in Foreign.Marshal.String that will marshal Haskell String-s to and from C strings.

(String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshalling.)

I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time. Thanks Tom Davie

Brandon Allbery

5:38 a.m.

New subject: How to write a pure String to String function in Haskell FFI to C++

On Sun, Jun 2, 2013 at 8:01 PM, Thomas Davie wrote:

...

On 2 Jun 2013, at 16:48, Brandon Allbery wrote:

(String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshalling.)

I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time.

I think you underestimate the complexity of the Haskell runtime and the interactions between it and the FFI. Admittedly it is probably not "obvious" in the sense of "anyone can tell without knowing anything about it that it can't possibly work", but it should be at least somewhat obvious to someone who sees why there needs to be an FFI in the first place that the situation is not trivial, and that they probably should not blindly assume that the only reason one can't just pass Haskell values directly to C is that some GHC developer was feeling lazy at the time. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Ting Lei

7:49 a.m.

New subject: How to write a pure String to String function in Haskell FFI to C++

Thanks for your answers so far. It seems that the laziness of String or [char] is the problem. My question boils then down to this. There are plenty of Haskell FFI examples where simple things like sin/cos in can be imported into Haskell as pure functions. Is there a way to extend that to String without introducing an IO (), but maybe sacrificing laziness? If String has to be lazy, is there another Haskell data type convertible to String that can do the job? The C++/C function (e.g. toUppers) is computation-only and as pure as cos and tan. The fact that marshaling string incurs an IO monad in current examples is kind of unintuitive and like a bug in design. I don't mind making redundant copies under the hood from one type to another.. On Sun, Jun 2, 2013 at 8:08 PM, Brandon Allbery wrote:

...

On Sun, Jun 2, 2013 at 8:01 PM, Thomas Davie wrote:

...
On 2 Jun 2013, at 16:48, Brandon Allbery wrote:

(String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshalling.)

I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time.

I think you underestimate the complexity of the Haskell runtime and the interactions between it and the FFI. Admittedly it is probably not "obvious" in the sense of "anyone can tell without knowing anything about it that it can't possibly work", but it should be at least somewhat obvious to someone who sees why there needs to be an FFI in the first place that the situation is not trivial, and that they probably should not blindly assume that the only reason one can't just pass Haskell values directly to C is that some GHC developer was feeling lazy at the time.

-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Chris Wong

8:25 a.m.

New subject: How to write a pure String to String function in Haskell FFI to C++

...

The C++/C function (e.g. toUppers) is computation-only and as pure as cos and tan. The fact that marshaling string incurs an IO monad in current examples is kind of unintuitive and like a bug in design. I don't mind making redundant copies under the hood from one type to another..

If you can guarantee that the call is pure, then you can execute it directly using `unsafePerformIO`. Simply call the external function as usual, then invoke `unsafePerformIO` on the result. See http://hackage.haskell.org/packages/archive/base/4.6.0.1/doc/html/System-IO-.... On another note, if you really care about performance, you should use the `bytestring` and `text` packages instead of String. They are implemented in terms of byte arrays, instead of linked lists, hence are both faster and more FFI-friendly.

...

On Sun, Jun 2, 2013 at 8:08 PM, Brandon Allbery wrote:

...
On Sun, Jun 2, 2013 at 8:01 PM, Thomas Davie wrote:

...
On 2 Jun 2013, at 16:48, Brandon Allbery wrote:

(String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode codepoint. And because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in the middle of random C code, so it must be done during marshalling.)

I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time.

I think you underestimate the complexity of the Haskell runtime and the interactions between it and the FFI. Admittedly it is probably not "obvious" in the sense of "anyone can tell without knowing anything about it that it can't possibly work", but it should be at least somewhat obvious to someone who sees why there needs to be an FFI in the first place that the situation is not trivial, and that they probably should not blindly assume that the only reason one can't just pass Haskell values directly to C is that some GHC developer was feeling lazy at the time.

-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Chris Wong, fixpoint conjurer e: lambda.fairy@gmail.com w: http://lfairy.github.io/

Carter Schonwald

12:36 p.m.

New subject: How to write a pure String to String function in Haskell FFI to C++

as the others have said, if you want to have text data go between ghc and c++, please use Text or Bytestring, String... would get weird. If you seriously want to experiment with writing low level code manipulating the String type, it *MIGHT* be possible using the GHC C minus minus (CMM). This would be very very very subtle to do correctly, and also just be really really complicated and hard. Likewise, for writing a "pure" looking ffi function, a good example is in the lz4hs lib, where all the allocation occurs on the haskell side, and the ffi is only mutating freshly allocated memory. Subject to this, unsafePerformIO can be safely used to give a safe pure thread safe api. cheers -Carter On Sun, Jun 2, 2013 at 10:55 PM, Chris Wong

...

wrote:

...

...
The C++/C function (e.g. toUppers) is computation-only and as pure as cos and tan. The fact that marshaling string incurs an IO monad in current examples is kind of unintuitive and like a bug in design. I don't mind making redundant copies under the hood from one type to another..

If you can guarantee that the call is pure, then you can execute it directly using `unsafePerformIO`. Simply call the external function as usual, then invoke `unsafePerformIO` on the result.

See < http://hackage.haskell.org/packages/archive/base/4.6.0.1/doc/html/System-IO-...

...
.

On another note, if you really care about performance, you should use the `bytestring` and `text` packages instead of String. They are implemented in terms of byte arrays, instead of linked lists, hence are both faster and more FFI-friendly.

...
On Sun, Jun 2, 2013 at 8:08 PM, Brandon Allbery

...
...
On Sun, Jun 2, 2013 at 8:01 PM, Thomas Davie

wrote:

...
...
On 2 Jun 2013, at 16:48, Brandon Allbery wrote:

(String is a linked list of Char, which is also not a C char; it is a constructor and a machine word large enough to hold a Unicode

codepoint. And

...
because Haskell is non-strict, any part of that linked list can be an unevaluated thunk which requires forcing the evaluation of arbitrary Haskell code elsewhere to "reify" the value; this obviously cannot be done in

...
...
...
middle of random C code, so it must be done during marshalling.)

I'm not convinced that that's "obvious" – though it certainly requires functions (that go through the FFI) to grab each character at a time.

I think you underestimate the complexity of the Haskell runtime and the interactions between it and the FFI. Admittedly it is probably not "obvious" in the sense of "anyone can tell without knowing anything about it that it can't possibly work", but it should be at least somewhat obvious to someone who sees why there needs to be an FFI in the first place that the situation is not trivial, and that they probably should not blindly assume that

wrote: the the

...
...
only reason one can't just pass Haskell values directly to C is that some GHC developer was feeling lazy at the time.

-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Chris Wong, fixpoint conjurer e: lambda.fairy@gmail.com w: http://lfairy.github.io/

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

adam vogt

5 Jun 5 Jun

7:38 a.m.

New subject: How to write a pure String to String function in Haskell FFI to C++

On Sun, Jun 2, 2013 at 10:19 PM, Ting Lei wrote:

...

Thanks for your answers so far.

It seems that the laziness of String or [char] is the problem.

My question boils then down to this. There are plenty of Haskell FFI examples where simple things like sin/cos in can be imported into Haskell as pure functions. Is there a way to extend that to String without introducing an IO (), but maybe sacrificing laziness? If String has to be lazy, is there another Haskell data type convertible to String that can do the job?

The C++/C function (e.g. toUppers) is computation-only and as pure as cos and tan. The fact that marshaling string incurs an IO monad in current examples is kind of unintuitive and like a bug in design. I don't mind making redundant copies under the hood from one type to another..

Hi Ting, In the Foreign.C.String there is a function that converts String to an array (CString = Ptr CChar) which can be handled on the C side: withCString :: String -> (CString -> IO a) -> IO a peekCString :: CString -> IO String It's slightly more convenient to use these functions through the preprocessor c2hs, as in the following example http://code.haskell.org/~aavogt/c_toUpper_ffi_ex/. c2hs also has a 'pure' keyword which makes it add the unsafePerformIO, but for whatever reason the side-effects were not done in the right order (the peekCString happened before the foreign function was called). Regards, Adam

4600

Age (days ago)

4603

Last active (days ago)

List overview

Download

7 comments

6 participants

participants (6)

adam vogt
Brandon Allbery
Carter Schonwald
Chris Wong
Thomas Davie
Ting Lei