[ANN] Haskell FFI Tutorial

Alex Petrov

8 Nov 2014 8 Nov '14

10:54 a.m.

Hi Everyone, I've recently started working on Haskell FFI Tutorial [1]. The repository already contains code that explains how to: * (expressively) represent C `struct` in Haskell code * call C code from Haskell * call Haskell code from C * operate on nested `struct`s * operate on `struct` arrays * decode `unions` * read and write C fixed-length strings and Pointer-type Strings * how to import functions from, for example, stdlib I'm still working on more story-like writeups, but the first one (about how to call Haskell from C) is already available [2]. If you have any feedback / comments / questions, know how to improve it or would like to see more things covered, just ping me. Thanks [1] https://github.com/ifesdjeen/haskell-ffi-tutorial [2] https://github.com/ifesdjeen/haskell-ffi-tutorial#calling-haskell-from-c -- Alex https://twitter.com/ifesdjeen http://clojurewerkz.org/

Attachments:

attachment.html (text/html — 3.3 KB)

Show replies by date

Charlie Durham

8 Nov 8 Nov

12:06 p.m.

I just built an FFI Interface for a camera in C++ called from Haskell, it would be nice to see an interface done for C++ with objects. I got it working pretty quickly searching around from other sites, but it would be cool if it was all in one place. Sort of off topic, I had an issue where the main in C++ ran perfectly fine calling into what I believe to be a singleton, but calling from Haskell led to that call freezing every time. If I put the call to the singleton into a static block, it runs fine calling from Haskell. Is this probably the "Static order initialization fiasco"? Charlie On Sat, Nov 8, 2014 at 10:54 AM, Alex Petrov wrote:

...

Hi Everyone,

I've recently started working on Haskell FFI Tutorial [1]. The repository already contains code that explains how to:

* (expressively) represent C `struct` in Haskell code * call C code from Haskell * call Haskell code from C * operate on nested `struct`s * operate on `struct` arrays * decode `unions` * read and write C fixed-length strings and Pointer-type Strings * how to import functions from, for example, stdlib

I'm still working on more story-like writeups, but the first one (about how to call Haskell from C) is already available [2].

If you have any feedback / comments / questions, know how to improve it or would like to see more things covered, just ping me.

Thanks

[1] https://github.com/ifesdjeen/haskell-ffi-tutorial [2] https://github.com/ifesdjeen/haskell-ffi-tutorial#calling-haskell-from-c

--

Alex https://twitter.com/ifesdjeen

http://clojurewerkz.org/

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Evan Laforge

10 Nov 10 Nov

2:22 p.m.

On Sat, Nov 8, 2014 at 7:54 AM, Alex Petrov wrote:

...

If you have any feedback / comments / questions, know how to improve it or would like to see more things covered, just ping me.

Just from personal experience, the fact that haskell types like Bool and Char are Storable combined with hsc's low level nature makes it easy to create memory corruption bugs by writing `(#poke Struct field) flag` instead of `(#poke Struct field) (toBool flag)`. The former will compile and likely work most of the time, but will corrupt memory, which will eventually result in a nondeterministic GC crash some time after the actual corruption. In my case, it took over a year to find the bug. I wound up writing a CStorable class that omits the dangerous instances, but a better solution is probably to use a higher level tool that doesn't make you manually specify the types of struct fields. I won't make the mistake again, and I wouldn't want to encourage anyone else to make it either. If I were writing a tutorial the very least I'd put in red flashing warnings.

Donn Cave

2:46 p.m.

quoth Evan Laforge

...

Just from personal experience, the fact that haskell types like Bool and Char are Storable combined with hsc's low level nature makes it easy to create memory corruption bugs by writing `(#poke Struct field) flag` instead of `(#poke Struct field) (toBool flag)`. The former will compile and likely work most of the time, but will corrupt memory, which will eventually result in a nondeterministic GC crash some time after the actual corruption. In my case, it took over a year to find the bug.

Could you explain this problem a little further? I looked in my code and found lots of stuff like `(#poke termios, c_iflag) a iflag', so was alarmed for a moment, but naturally I've declared iflag Word32, via the hsc #type macro. And I'd expect the targets of poking to be like that - full of CChar and Word32 etc. - foreign types of the appropriate size. The idea that there'd be anything else in there changes the perspective somewhat, and if that's common, indeed the tutorial should account for it. (Or maybe I completely misunderstand what's going on here!) Donn

Evan Laforge

11 Nov 11 Nov

12:55 a.m.

On Mon, Nov 10, 2014 at 11:46 AM, Donn Cave wrote:

...

Could you explain this problem a little further? I looked in my code and found lots of stuff like `(#poke termios, c_iflag) a iflag', so was alarmed for a moment, but naturally I've declared iflag Word32, via the hsc #type macro. And I'd expect the targets of poking to be like that - full of CChar and Word32 etc. - foreign types of the appropriate size. The idea that there'd be anything else in there changes the perspective somewhat, and if that's common, indeed the tutorial should account for it.

(Or maybe I completely misunderstand what's going on here!)

If you've ensured that iflag is a C type, then you've avoided the problem. But if, for instance, iflag was a Char, not a CChar, and you poke it into a 'char' struct field forgetting to convert to a CChar, you'll get memory corruption. I don't know if it's a common mistake, but I sure made it (very infrequently, but once is enough, in fact once is even worse), and the compiler won't tell you if you did. When I mentioned it on the list way back when no one responded, so maybe other people don't fall into that trap.

Donn Cave

2:38 a.m.

quoth Evan Laforge ...

...

If you've ensured that iflag is a C type, then you've avoided the problem. But if, for instance, iflag was a Char, not a CChar, and you poke it into a 'char' struct field forgetting to convert to a CChar, you'll get memory corruption. I don't know if it's a common mistake, but I sure made it (very infrequently, but once is enough, in fact once is even worse), and the compiler won't tell you if you did. When I mentioned it on the list way back when no one responded, so maybe other people don't fall into that trap.

Maybe they don't! I guess it isn't so much about exactly what you were up to, but for the sake of getting to whether there's an issue here for the tutorial, I wrote up a little example program, with CChar and Char. The commented alternatives work as well, at least it looks fine to me. Notes on this: - the C struct is { char a; char b; char c; } - the Haskell T struct uses CChar, and I assert that this is the only sane option -- no storable struct for foreign use should ever have a field type like Char. - that means the Storable instance in question is CChar, and it looks to me like poke reliably writes exactly one byte in this case, whatever value is supplied (I also tried Int.) - one might very well manage to keep all the poking to t fields in the T Storable instance - that's what I'd expect the tutorial to focus on. Not that it makes any great difference, but I'm just saying that the "ypoke" function in the example is there purely for the purpose of testing that Char/CChar thing you're talking about, and would be somewhat outside what I see as core usage. Donn ------------ {-# LANGUAGE ForeignFunctionInterface #-} module Main (main) where import Foreign import Foreign.C #include "ffipokehsc.h" data T = T { taflag :: CChar , tbflag :: CChar , tcflag :: CChar } deriving Show instance Storable T where sizeOf _ = #size struct t alignment _ = alignment (undefined::CDouble) peek p = do aflag <- (#peek struct t, a) p bflag <- (#peek struct t, b) p cflag <- (#peek struct t, c) p return (T aflag bflag cflag) poke p (T aflag bflag cflag) = do (#poke struct t, a) p aflag (#poke struct t, b) p bflag (#poke struct t, c) p cflag -- ypoke :: CChar -> CChar -> CChar -> IO T ypoke :: Char -> Char -> Char -> IO T ypoke a b c = alloca $ \ tp -> do (#poke struct t, a) tp a (#poke struct t, b) tp b (#poke struct t, c) tp c peek tp -- main = ypoke 97 98 99 >>= print -- main = ypoke 'a' 'b' 'c' >>= print tptr :: T -> IO (Ptr T) tptr t = alloca $ \ pt -> do poke pt t return pt main = do p <- tptr (T 97 98 99) t <- peek p print t

Evan Laforge

10:53 a.m.

On Mon, Nov 10, 2014 at 11:38 PM, Donn Cave wrote:

...

Maybe they don't! I guess it isn't so much about exactly what you were up to, but for the sake of getting to whether there's an issue here for the tutorial, I wrote up a little example program, with CChar and Char. The commented alternatives work as well, at least it looks fine to me. ... - that means the Storable instance in question is CChar, and it looks to me like poke reliably writes exactly one byte in this case, whatever value is supplied (I also tried Int.)

I think Int is probably unsafe too, in theory if not in practice.

...

- one might very well manage to keep all the poking to t fields in the T Storable instance - that's what I'd expect the tutorial to focus on. Not that it makes any great difference, but I'm just saying that the "ypoke" function in the example is there purely for the purpose of testing that Char/CChar thing you're talking about, and would be somewhat outside what I see as core usage.

Yes, it would be safe to say that all haskell data types which are serializable to C should have only C types. That also avoids the problem. However, you have to convert between haskell and C at some point, and that means you wind up with C and haskell duplicates of all records, so each one is actually expressed in 3 places: the C struct, the haskell "CType" record, and the haskell type record. To me it seemed the logical place to do haskell to C type conversions was in the poke method itself, but that's because I didn't think about the corruption thing. You might think it's obvious, but most type errors are obvious. People do obviously dumb things all the time, and the nice thing about a type checker is that we get a compile error, not memory corruption. Another reason you wind up with Storable instances of non-CType records is Data.Vector.Storable. It's very tempting to simply reuse that to pass to C, and maybe it's initially fine because it has Ints or Word32s or something "safe", but then one day 2 years later someone who doesn't know about that adds a Char field and now you're in trouble.

...

ypoke :: Char -> Char -> Char -> IO T ypoke a b c = alloca $ \ tp -> do (#poke struct t, a) tp a (#poke struct t, b) tp b (#poke struct t, c) tp c peek tp

This is corrupting memory, since sizeOf 'c' == 4. Like I said, it will probably look like it works because it's usually just overwriting adjacent fields or perhaps alignment padding or maybe it's "safe" if it's on the stack, but you are likely to get mysterious crashes under load. Try changing the order of the pokes and see what happens. I have to say I'm a bit surprised to be arguing for type safety vs. "just remember to do the right thing and you won't get memory corruption" on a haskell list :)

Donn Cave

12:35 p.m.

quoth Evan Laforge ...

...

Yes, it would be safe to say that all haskell data types which are serializable to C should have only C types. That also avoids the problem. However, you have to convert between haskell and C at some point, and that means you wind up with C and haskell duplicates of all records, so each one is actually expressed in 3 places: the C struct, the haskell "CType" record, and the haskell type record.

That could happen ... I suppose there's no one way to do it that fits every application, but often enough I can make do with the Haskell "CTYpe" record, and it gives me a Storable value that has a valid type enforced relationship to the C struct. You wouldn't always need or want a Haskell version of your C struct, per se, but if you do.

...

Another reason you wind up with Storable instances of non-CType records is Data.Vector.Storable. It's very tempting to simply reuse that to pass to C, and maybe it's initially fine because it has Ints or Word32s or something "safe", but then one day 2 years later someone who doesn't know about that adds a Char field and now you're in trouble.

Data.Vector.Storable is a new one on me! (I see it's in the non-portable, experimental category, so it figures.) The documentation I'm looking at seems to be saying it wouldn't be "instantiated" for Char, though?

...

...
ypoke :: Char -> Char -> Char -> IO T ypoke a b c = alloca $ \ tp -> do (#poke struct t, a) tp a (#poke struct t, b) tp b (#poke struct t, c) tp c peek tp

This is corrupting memory, since sizeOf 'c' == 4. Like I said, it will probably look like it works because it's usually just overwriting adjacent fields or perhaps alignment padding or maybe it's "safe" if it's on the stack, but you are likely to get mysterious crashes under load. Try changing the order of the pokes and see what happens.

Ah, you're right, a final poke to offset 0 overwrites everything. In retrospect I ... don't know what I was thinking! Donn

Evan Laforge

1:55 p.m.

On Tue, Nov 11, 2014 at 9:35 AM, Donn Cave wrote:

...

That could happen ... I suppose there's no one way to do it that fits every application, but often enough I can make do with the Haskell "CTYpe" record, and it gives me a Storable value that has a valid type enforced relationship to the C struct. You wouldn't always need or want a Haskell version of your C struct, per se, but if you do.

True, but... consider Bool, with no corresponding C type. Just picking something like CUChar is inconvenient and ugly on the haskell side.

...

Data.Vector.Storable is a new one on me! (I see it's in the non-portable, experimental category, so it figures.) The documentation I'm looking at seems to be saying it wouldn't be "instantiated" for Char, though?

It is actually, it's perfectly valid to have a vector of Chars. More importantly, you can have a vector of records that include Storable types. It's an acceptable use of Storable, I just think the "pun" where Storable for haskell types is the same as the Storable for C types is unfortunate. Anyway, in the big picture, I think hsc2hs is just too low level. We shouldn't be having to manually poke structs at all, and it's fundamentally dangerous even if you use all C types because there's no typechecking. Nothing will help you when someone updates the struct and forgets to update the Storable instance. If I were starting again (or advising someone who was starting from scratch), I'd try really hard to find something that directly generates marshalling code from the .h file, perhaps c2hs can do that.

Donn Cave

12 Nov 12 Nov

2:37 a.m.

quoth Evan Laforge ...

...

Anyway, in the big picture, I think hsc2hs is just too low level. We shouldn't be having to manually poke structs at all, and it's fundamentally dangerous even if you use all C types because there's no typechecking. Nothing will help you when someone updates the struct and forgets to update the Storable instance. If I were starting again (or advising someone who was starting from scratch), I'd try really hard to find something that directly generates marshalling code from the .h file, perhaps c2hs can do that.

I'm not up for generating anything directly from a .h file, but just for fun I put a little time today into an hsc alternative with a little more type safety. It works to a certain extent with integral types, because I can determine the size of the C field and assign an appropriate Haskell foreign integral type. I used it to generate a module for struct termios, for which I've been using hsc2hs, and it did better than hsc2hs in an unexpected way - hsc2hs #type maps "unsigned long" to "Word32", but on MacOS X the field size is 8 bytes. Then I make a raft of peek and poke functions that take the native Haskell values that you specify, and convert them to the appropriate foreign types. This is based on a descriptor file where you specify the fields you want to use, your Haskell name and type for each, and whether it's integral or whatever. Integral or whatever is where the joy leaks out of the concept, though. I'd bet a quarter that at least one in every four .hsc files contains some custom peeking and poking for a struct field, stuff you'd never anticipate. Full support for all types seems like a nearly unbounded problem. hsc2hs could do essentially what I'm talking about, if it could tell you the size of a field. Then you could declare the foreign types like termiosC_lflag :: (#appendfieldsize Word struct termios, c_lflag) which would expand to termiosC_lflag :: Word64 (fieldsize would be sizeof(x->c_lflag) * 8) This would give us foreign types based on the C declaration. You'd have to do the conversions yourself where you want to use native types, in this scenario. Donn

Richard A. O'Keefe

8:03 p.m.

On 12/11/2014, at 8:37 pm, Donn Cave wrote:

...

It works to a certain extent with integral types, because I can determine the size of the C field and assign an appropriate Haskell foreign integral type. I used it to generate a module for struct termios, for which I've been using hsc2hs, and it did better than hsc2hs in an unexpected way - hsc2hs #type maps "unsigned long" to "Word32", but on MacOS X the field size is 8 bytes.

That’s not quite right. compile with “cc -m64” and the field size is 8 bytes, but compile with “cc -m32” and the field size is 4 bytes. (This actually strikes me as a flaw in Mac OS X: ‘typedef unsigned long tcflag_t;’ should have been ‘typedef uint32_t tcflag_t;' because there’s no _reason_ for the field size to change this way.) The fact that the sizes of things can vary between compilation environments on the same host is one of the reasons tools like hsc2hs are hard.

Donn Cave

8:41 p.m.

quoth Richard A. O'Keefe, ...

...

(This actually strikes me as a flaw in Mac OS X: `typedef unsigned long tcflag_t;' should have been `typedef uint32_t tcflag_t;' because there's no _reason_ for the field size to change this way.)

True, it's crazy to have structures changing size like that when the effective size of each field is fixed per standard.

...

The fact that the sizes of things can vary between compilation environments on the same host is one of the reasons tools like hsc2hs are hard.

I have to confess that this was to some degree my error as well - I was looking at a somewhat elderly hsc2hs output that probably predated the current architecture and platform level, and when I run it now, hsc2hs does assign the appropriate size foreign type and the right offsets. Donn

Brandon Allbery

9:07 p.m.

On Wed, Nov 12, 2014 at 8:41 PM, Donn Cave wrote:

...

quoth Richard A. O'Keefe, ...

...
(This actually strikes me as a flaw in Mac OS X: `typedef unsigned long tcflag_t;' should have been `typedef uint32_t tcflag_t;' because there's no _reason_ for the field size to change this way.)

True, it's crazy to have structures changing size like that when the effective size of each field is fixed per standard.

For what it's worth, this like many other things was inherited from FreeBSD. (See /usr/include/sys/_termios.h therein, which I chased down from /usr/include/termios.h.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Evan Laforge

13 Nov 13 Nov

midnight

Doesn't hsc2hs give you the size with #size? I feel like I'm not understanding your approach. I was thinking of using language-c to parse the .h file, and generate an hs file: structs.h: struct example { int a; char b; } Structs_generated.hs: module X.Y.Structs_generated where import X.Y.Structs (Example) poke_example_a :: (#type int) -> Ptr Example -> IO () poke_example_b :: (#type char) -> Ptr Example -> IO () example_alignment = 4 example_size = 5 You have to provide the Structs module that exports the types, this just makes sure that the pokes only work on the intended types. Then the poke functions tie the struct field name and its type and the record type. For the #type I could either then run this through hsc2hs to get the types, or just have a hardcoded mapping. #type sees through typedefs and the like, but all it can see is size and signedness. While I'm at it I could add foreign declarations for the function prototypes, and solve the problem of keeping those in sync. Another thing I like about this approach is that you import a generated file, but all the marshaling work is done in a plain .hs file. Putting code in a .hsc breaks ghci since you need to regenerate after each edit, and it breaks tools that want to parse hs source. I'm sure it's way more complicated than it seems though, things inevitably are. I read the c2hs docs but I kind of don't really get it yet. I should just download a project that uses it and see how it works. On Wed, Nov 12, 2014 at 5:03 PM, Richard A. O'Keefe wrote:

...

The fact that the sizes of things can vary between compilation environments on the same host is one of the reasons tools like hsc2hs are hard.

That just means you need the same compiler flags for the C side as for the haskell side, right? Presumably that's not too hard because you build it all together. Or is there some further complexity?

Donn Cave

1:23 a.m.

quoth Evan Laforge

...

Doesn't hsc2hs give you the size with #size? I feel like I'm not understanding your approach.

Maybe it does, and I only need a clue to the syntax. I have seen it used only for the size of the struct; I'm looking for the size of a field - and devoid of parentheses or other adornments, so I can tack it onto something like Word or Int. You could define type aliases for other variable-size foreign types, like Float8 etc. and use #fieldsize to select them. One of your objections was that you're obliged to code field types in the .hsc file, and if you make an error or the types are changed later in the C code, hsc2hs doesn't notice. This approach gives you a way to partially derive a foreign type from the size of the field. I don't know language-c. Lots of interesting potential there if it works well. Donn

Evan Laforge

1:49 p.m.

On Wed, Nov 12, 2014 at 10:23 PM, Donn Cave wrote:

...

Maybe it does, and I only need a clue to the syntax. I have seen it used only for the size of the struct; I'm looking for the size of a field - and devoid of parentheses or other adornments, so I can tack it onto something like Word or Int. You could define type aliases for other variable-size foreign types, like Float8 etc. and use #fieldsize to select them.

Oh I see. Maybe this: #let fieldsize t, f = "%lu", (unsigned long)sizeof(((t *)0)->f) c :: Word#{fieldsize example, c} Of course you will wind up with Word64 for a double which is not that great, and you lose signedness. The hsc_type macro has some ridiculous but effective hackery to figure that out. So I bet a #fieldtype macro can be defined without too much trouble, perhaps without even having to modify hsc2hs.

...

I don't know language-c. Lots of interesting potential there if it works well.

Just because I was curious, I wrote up an implementation last night. It converts: typedef int xy_t; struct example { xy_t a; }; to module Example_generated where import qualified Example as M #let alignment t = "%lu", (unsigned long)offsetof(struct {char x__; t (y__); }, y__) -- struct example sizeOf_example :: Int sizeOf_example = #size example alignment_example :: Int alignment_example = #{alignment example} poke_example_a :: (#type xy_t) -> Ptr M.Example -> IO () poke_example_a = (#poke example, a) peek_example_a :: Ptr M.Example -> IO (#type xy_t) peek_example_a = (#peek example, a) Of course now it occurs to me that if instead I emitted the typedef as 'type Xy_t = CInt' then I could hardcode the primitive C types and not need #type anymore. I'd want to emit type synonyms anyway for function typedefs. And naturally further complexity is in store for declarations like 'char c, *const *s'. One thing is I just ignore unnamed structs, since I couldn't figure out how to write a #poke for them. And then of course I want to filter by header so I don't always generate 50 zillion declarations from system headers, and now using #type instead of type synonyms looks better, otherwise I wind up needing hs versions of all system headers. So it seems feasible, but is naturally fraught with details that will turn it into a real project rather than a quick hack. Since I don't really plan to write any more FFIs in the near future, that's where I'll leave it :) Your #fieldtype macro certainly seems more practical to get something useful working quickly.

Niklas Hambüchen

11 Nov 11 Nov

2:37 p.m.

On 11/11/14 06:55, Evan Laforge wrote:

...

... memory corruption. I don't know if it's a common mistake, but I sure made it

You're definitely not alone here. Took me over a day to find, and it was just a 500 lines application.

Evan Laforge

10:40 p.m.

On Tue, Nov 11, 2014 at 11:37 AM, Niklas Hambüchen wrote:

...

Took me over a day to find, and it was just a 500 lines application.

More than a year, maybe two in my case, and was probably around 50k lines at the time, though only 1k in the FFI. It wouldn't show up in tests, except when it did, so I wrote all manner of special test frameworks to try to reproduce it reliably, nothing did. valgrind doesn't help with this kind of problem. I must have looked straight at the bad code a hundred times at least. Anyway it's a lesson I won't soon forget.

Henk-Jan van Tuyl

3:38 p.m.

On Sat, 08 Nov 2014 16:54:06 +0100, Alex Petrov wrote:

...

I've recently started working on Haskell FFI Tutorial [1]. : [1] https://github.com/ifesdjeen/haskell-ffi-tutorial

I have added a link to this at https://www.haskell.org/haskellwiki/Foreign_Function_Interface#Links Regards, Henk-Jan van Tuyl -- Folding@home What if you could share your unused computer power to help find a cure? In just 5 minutes you can join the world's biggest networked computer and get us closer sooner. Watch the video. http://folding.stanford.edu/ http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming --

3842

Age (days ago)

3847

Last active (days ago)

List overview

Download

18 comments

8 participants

participants (8)

Alex Petrov
Brandon Allbery
Charlie Durham
Donn Cave
Evan Laforge
Henk-Jan van Tuyl
Niklas Hambüchen
Richard A. O'Keefe