Haskell integration with C/C++ (GSOC)

Hey Guys, I'm Julian, I am reaching the end of my second year as a JMC (Joint Mathematics and Computer science) Student at Imperial College London and I'd like to apply to GSOC for a project involving Haskell and I just wanted to run my idea past the community. I've already talked about this on the haskell-soc IRC channel (I go by the pseudonym "julek"). What I would like to do is to improve the integration of C/C++ with Haskell, particularly in calling Haskell from C/C++. Currently ghc is able to generate stubs to export functions whose arguments are simple types such as CInts into C/C++. The stub generated is always in an extern "C" clause due to the fact that ghc does not as yet implement the C++ calling conventions as defined in the "The Haskell 98 Foreign Function Interface 1.0" (http://www.cse.unsw.edu.au/~chak/haskell/ffi/ffi.pdf) So a first step would be to implement this calling convention to bring it up to speed with the above referenced report. This shouldn't be too hard and mostly involves implementing C++ name mangling conventions. Next, I would like to extend the stub generation so as to be able to deal with more complex types. The type systems in C++ and Haskell have many analogous syntaxes that can be easily exploited to provide strong compatibility and interoperability between the two languages. For example a simple type such as: data Foo = A | B Could be implemented as an enum in C/C++: enum Foo {A, B}; More advanced types that take arguments such as: data Tree = Node Left Right | Leaf Could be converted to a struct in C/C++: struct Tree { struct Tree* left; struct Tree* right; }; Types that have functions that act on them such as: data IntContainer = IntContainer Int getInt :: IntContainer -> Int getInt (IntContainer a) = a could have these functions automatically converted to C/C++: struct IntContainer { int a; }; extern int getInt_hs(IntContainer a); This also opens up the possibility of exploiting C/C++ name mangling conventions, to allow the _hs postfix I'm suggesting here to be eliminated. Haskell classes: class Arithmetic a where (+) :: a -> a -> a (*) :: a -> a -> a (-) :: a -> a -> a (/) :: a -> a -> a could be implemented using C++ functions with virtual members: class Monad { public: virtual Monad add(Monad a, Monad b); virtual Monad mult(Monad a, Monad b) virtual Monad neg(Monad a, Monad b); virtual Monad div(Monad a, Monad b); } All types of single/multiple instancing (i.e. either directly or through requirements of instances) would be implemented using single/multiple inheritance. Obviously, this example is rather contrived due to the conversion of the function names. The fact that the rules that govern function naming in Haskell are much more permissive than those of C/C++ might cause compatibility issues. This can be worked around by implementing a similar syntax to that currently used for function imports by the FFI. E.g..: foreign export ccall "bind" >>= :: CInt -> CInt Similar to: foreign import ccall "f" func :: CInt -> CInt The latter is the current syntax for imports. The name given for the export would be checked for legality in the namespace of the target language. Alternatively this could be done in an automated manner using some naming conventions as well as operator polymorphism, but this would probably sacrifice ease of use. Finally polymorphic Haskell functions/types can be implemented in C++ using templates. I would like to extend ghc to implement enhanced C/C++ stub generation using the methods described above as well as to generate Haskell stubs which describe the Haskell CType equivalents of the Haskell types defined, functions for conversion between the two and function stubs to convert the types, run the Haskell function and convert back as required. On top of this I'd like to write C/C++ libraries for dealing with most of the standard Haskell types such as Maybe, Either, etc... Finally, I'd like to work on ironing out any bugs that remain in the RTS when it is used in "multilingual" situations, as well as improving it's performance in this situation. I found an example of such a bug, which I will test further before reporting it. It seems to be the opposite of the following bug: http://hackage.haskell.org/trac/ghc/ticket/5594 i.e. the stdout buffer isn't always correctly flushed when calling C/C++ in a program whose main is written in Haskell. For example, when running the code: main.hs: module Main where import Foreign.C.Types import System.IO foreign import ccall "inc" c_inc :: CInt -> CInt main :: IO () main = do putStr "Enter n: " -- hFlush stdout s <-getLine putStrLn . show . c_inc . read $ s inc.c: int inc(int i) __attribute__ ((const)); int inc(int i) { return i + 1; } Built with Makefile: all: gcc -c -o inc.o inc.c ghc --make -main-is Main main.hs inc.o -o test rm *.hi *.o The output comes out as: [julek@cryptid inc]$ ./test 2 Enter n: 3 But when the " hFlush stdout" line is commented back in, the output is: [julek@cryptid inc]$ ./test Enter n: 2 3 which is correct, but the extra line shouldn't be necessary. I am currently using ghc 7.4.1 which is the newest version, so this is a current bug. I have had a look for such a bug being reported and have found no such report. I'll look into this further before reporting it, but I am fairly certain this is a bug in the RTS. As part of this project, I would fix this bug (if it is still around when I start) as well as looking for other ones in this area. I think that extending ghc to the level required by "The Haskell 98 Foreign Function Interface 1.0" specification and above would reap significant benefit to the Haskell community. The improved integration into C/C++ would open the door for this to happen for several other languages and would make Haskell more widespread. Many Haskell beginners are daunted by the falsely perceived complexity of working with Haskell IO and monads, but love using the massive advantages that the paradigm gives in a non monadic context. Due to this, simplifying the interoperability between Haskell and C/C++ would enable many of these users to stick around for longer and perhaps encourage them to eventually look deeper into the language. This would make the size of the community grow and make the use of Haskell more widespread, potentially reaping benefits for the community at large. I believe this could be implemented within the time frame given for GSOC. Hope you like the ideas presented here and hopefully I'll be accepted into the Haskell summer of code! If anybody has any opinions on the implementability/usefulness of this and/or criticism of the idea, please inform me and I'll be happy to discuss it! Looking forward to hearing from you! Thanking you in advance. Kind regards, Julian Sutherland

I love the idea of easier to use FFI, but isn't the haskell FFI
intentionally very low level, and intended to be used with tools?
In that light, maybe it would be easier to extend hsc2hs with fancier
macros and the ability to generate wrappers to directly call C++
methods and construct C++ objects. E.g. you could write
obj <- #new Thing, arg, arg
and it would generate
extern "C" newThing(int arg, int arg) { return new Thing(arg, arg); }
I guess then you need it to be able to automatically insert 'foreign'
declarations, and convert between types, and at that point you're
already halfway to something like SWIG or green card. Or c2hs, not
that I know anything about that. Anyway, maybe you can add C++
support to an existing preprocessor instead of putting it directly
into ghc.
One reason I like hsc2hs is the control it gives me. That said, I
have a lot of pretty standard boilerplate Storable instances,
withCString, etc. And a file of trivial extern "C" wrappers to bridge
from C++. There's a lot of room for automation in there.
On Wed, Apr 4, 2012 at 10:53 PM, Sutherland, Julian
Hey Guys,
I'm Julian, I am reaching the end of my second year as a JMC (Joint Mathematics and Computer science) Student at Imperial College London and I'd like to apply to GSOC for a project involving Haskell and I just wanted to run my idea past the community.
I've already talked about this on the haskell-soc IRC channel (I go by the pseudonym "julek").
What I would like to do is to improve the integration of C/C++ with Haskell, particularly in calling Haskell from C/C++.
Currently ghc is able to generate stubs to export functions whose arguments are simple types such as CInts into C/C++. The stub generated is always in an extern "C" clause due to the fact that ghc does not as yet implement the C++ calling conventions as defined in the "The Haskell 98 Foreign Function Interface 1.0" (http://www.cse.unsw.edu.au/~chak/haskell/ffi/ffi.pdf)
So a first step would be to implement this calling convention to bring it up to speed with the above referenced report. This shouldn't be too hard and mostly involves implementing C++ name mangling conventions.
Next, I would like to extend the stub generation so as to be able to deal with more complex types.
The type systems in C++ and Haskell have many analogous syntaxes that can be easily exploited to provide strong compatibility and interoperability between the two languages.
For example a simple type such as:
data Foo = A | B
Could be implemented as an enum in C/C++:
enum Foo {A, B};
More advanced types that take arguments such as:
data Tree = Node Left Right | Leaf
Could be converted to a struct in C/C++:
struct Tree { struct Tree* left; struct Tree* right; };
Types that have functions that act on them such as:
data IntContainer = IntContainer Int
getInt :: IntContainer -> Int getInt (IntContainer a) = a
could have these functions automatically converted to C/C++:
struct IntContainer { int a; };
extern int getInt_hs(IntContainer a);
This also opens up the possibility of exploiting C/C++ name mangling conventions, to allow the _hs postfix I'm suggesting here to be eliminated.
Haskell classes:
class Arithmetic a where (+) :: a -> a -> a (*) :: a -> a -> a (-) :: a -> a -> a (/) :: a -> a -> a
could be implemented using C++ functions with virtual members:
class Monad { public: virtual Monad add(Monad a, Monad b); virtual Monad mult(Monad a, Monad b) virtual Monad neg(Monad a, Monad b); virtual Monad div(Monad a, Monad b); }
All types of single/multiple instancing (i.e. either directly or through requirements of instances) would be implemented using single/multiple inheritance.
Obviously, this example is rather contrived due to the conversion of the function names. The fact that the rules that govern function naming in Haskell are much more permissive than those of C/C++ might cause compatibility issues.
This can be worked around by implementing a similar syntax to that currently used for function imports by the FFI. E.g..: foreign export ccall "bind" >>= :: CInt -> CInt
Similar to: foreign import ccall "f" func :: CInt -> CInt
The latter is the current syntax for imports.
The name given for the export would be checked for legality in the namespace of the target language.
Alternatively this could be done in an automated manner using some naming conventions as well as operator polymorphism, but this would probably sacrifice ease of use.
Finally polymorphic Haskell functions/types can be implemented in C++ using templates.
I would like to extend ghc to implement enhanced C/C++ stub generation using the methods described above as well as to generate Haskell stubs which describe the Haskell CType equivalents of the Haskell types defined, functions for conversion between the two and function stubs to convert the types, run the Haskell function and convert back as required.
On top of this I'd like to write C/C++ libraries for dealing with most of the standard Haskell types such as Maybe, Either, etc...
Finally, I'd like to work on ironing out any bugs that remain in the RTS when it is used in "multilingual" situations, as well as improving it's performance in this situation.
I found an example of such a bug, which I will test further before reporting it. It seems to be the opposite of the following bug: http://hackage.haskell.org/trac/ghc/ticket/5594
i.e. the stdout buffer isn't always correctly flushed when calling C/C++ in a program whose main is written in Haskell.
For example, when running the code:
main.hs: module Main where
import Foreign.C.Types import System.IO
foreign import ccall "inc" c_inc :: CInt -> CInt
main :: IO () main = do putStr "Enter n: " -- hFlush stdout s <-getLine putStrLn . show . c_inc . read $ s
inc.c:
int inc(int i) __attribute__ ((const));
int inc(int i) { return i + 1; }
Built with Makefile: all: gcc -c -o inc.o inc.c ghc --make -main-is Main main.hs inc.o -o test rm *.hi *.o
The output comes out as: [julek@cryptid inc]$ ./test 2 Enter n: 3
But when the " hFlush stdout" line is commented back in, the output is: [julek@cryptid inc]$ ./test Enter n: 2 3
which is correct, but the extra line shouldn't be necessary.
I am currently using ghc 7.4.1 which is the newest version, so this is a current bug.
I have had a look for such a bug being reported and have found no such report. I'll look into this further before reporting it, but I am fairly certain this is a bug in the RTS.
As part of this project, I would fix this bug (if it is still around when I start) as well as looking for other ones in this area.
I think that extending ghc to the level required by "The Haskell 98 Foreign Function Interface 1.0" specification and above would reap significant benefit to the Haskell community.
The improved integration into C/C++ would open the door for this to happen for several other languages and would make Haskell more widespread.
Many Haskell beginners are daunted by the falsely perceived complexity of working with Haskell IO and monads, but love using the massive advantages that the paradigm gives in a non monadic context. Due to this, simplifying the interoperability between Haskell and C/C++ would enable many of these users to stick around for longer and perhaps encourage them to eventually look deeper into the language. This would make the size of the community grow and make the use of Haskell more widespread, potentially reaping benefits for the community at large.
I believe this could be implemented within the time frame given for GSOC.
Hope you like the ideas presented here and hopefully I'll be accepted into the Haskell summer of code!
If anybody has any opinions on the implementability/usefulness of this and/or criticism of the idea, please inform me and I'll be happy to discuss it!
Looking forward to hearing from you!
Thanking you in advance.
Kind regards,
Julian Sutherland
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, Apr 5, 2012 at 01:53, Sutherland, Julian < julian.sutherland10@imperial.ac.uk> wrote:
data Tree = Node Left Right | Leaf
Could be converted to a struct in C/C++:
struct Tree { struct Tree* left; struct Tree* right; };
Shouldn't this actually be a tagged union? Not that they exist as such in C/C++, but are easy enough to emulate (minus the extra type checking that real tagged unions such as even Pascal gives you): struct Tree { enum {Node, Leaf} tag; /* possibly tag sanity checking macros defined here */ union { struct { struct Tree *tree_Node_left; #ifdef HSC_CHECKED # define tree_left(tree) (tree->tag == Leaf ? _hsc_abort("tree: Node.left of Leaf") : tree->_tree_Node.tree_Node_left) #else # define tree_left(tree) tree->_tree_Node.tree_Node_left #endif struct Tree *tree_Node_right; #ifdef HSC_CHECKED # define tree_right(tree) (tree->tag == Leaf ? _hsc_abort("tree: Node.right of Leaf") : tree->_tree_Node.tree_Node_right) #else # define tree_right(tree) tree->_tree_Node.tree_Node_right #endif } _tree_Node; /* strictly we can collapse out the union here because Leaf has no data */ } _tree_data; Similarly several of your other examples could use a bit more thought. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Am 05.04.2012 um 08:42 schrieb Brandon Allbery:
On Thu, Apr 5, 2012 at 01:53, Sutherland, Julian
wrote: data Tree = Node Left Right | Leaf Could be converted to a struct in C/C++:
struct Tree { struct Tree* left; struct Tree* right; };
Shouldn't this actually be a tagged union? Not that they exist as such in C/C++, but are easy enough to emulate (minus the extra type checking that real tagged unions such as even Pascal gives you):
Not necessarily. The above type is equivalent to type Tree = Maybe (Left, Right) (modulo strictness, of course) and the C/C++ convention is to represent Nothing by a NULL pointer.

On Thu, Apr 5, 2012 at 03:21, Holger Siegel
On Thu, Apr 5, 2012 at 01:53, Sutherland, Julian < julian.sutherland10@imperial.ac.uk> wrote: data Tree = Node Left Right | Leaf
Could be converted to a struct in C/C++:
struct Tree { struct Tree* left; struct Tree* right; };
Shouldn't this actually be a tagged union? Not that they exist as such in C/C++, but are easy enough to emulate (minus the extra type checking
Am 05.04.2012 um 08:42 schrieb Brandon Allbery: that real tagged unions such as even Pascal gives you):
Not necessarily. The above type is equivalent to
type Tree = Maybe (Left, Right)
But that is not the type being translated. (Also, I would still consider translating Maybe as something other than a NULL pointer, just to enable sanity checking.) -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On Thursday, April 5, 2012 at 1:53 AM, Sutherland, Julian wrote:
Hey Guys,
I'm Julian, I am reaching the end of my second year as a JMC (Joint Mathematics and Computer science) Student at Imperial College London and I'd like to apply to GSOC for a project involving Haskell and I just wanted to run my idea past the community.
[snip]
I found an example of such a bug, which I will test further before reporting it. It seems to be the opposite of the following bug: http://hackage.haskell.org/trac/ghc/ticket/5594
i.e. the stdout buffer isn't always correctly flushed when calling C/C++ in a program whose main is written in Haskell.
For example, when running the code:
main.hs: module Main where
import Foreign.C.Types import System.IO
foreign import ccall "inc" c_inc :: CInt -> CInt
main :: IO () main = do putStr "Enter n: " -- hFlush stdout s <-getLine putStrLn . show . c_inc . read $ s
inc.c:
int inc(int i) __attribute__ ((const));
int inc(int i) { return i + 1; }
Built with Makefile: all: gcc -c -o inc.o inc.c ghc --make -main-is Main main.hs inc.o -o test rm *.hi *.o
The output comes out as: [julek@cryptid inc]$ ./test 2 Enter n: 3
But when the " hFlush stdout" line is commented back in, the output is: [julek@cryptid inc]$ ./test Enter n: 2 3
which is correct, but the extra line shouldn't be necessary.
I am currently using ghc 7.4.1 which is the newest version, so this is a current bug.
I think this is a consequence of line buffering rather than a bug. If you write your own increment function in Haskell, you get the same behavior. If you `hSetBuffering stdout NoBuffering` before your `putStr` call, you should get the behavior you wanted. I've seen similar issues with programs written in many languages; it's just one of those gotchas to be aware of when dealing with console UI. As to the rest of your proposal, when you started out talking about calling Haskell from C or C++, I hoped you would focus on the pain of linking the final executable. This seems to me a more fundamental -- and addressable -- stumbling block than the richness of interaction with C++. As things stand, it is quite a hassle to use a Haskell library of any complexity called from C. Improved interaction with C++ could indeed be valuable, but, unless something has changed recently, work is still needed to improve the more basic C -> Haskell FFI story. Anthony

Quoth Anthony Cowley
I think this is a consequence of line buffering rather than a bug. If you write your own increment function in Haskell, you get the same behavior. If you `hSetBuffering stdout NoBuffering` before your `putStr` call, you should get the behavior you wanted.
Though if you must do one or the other, I think hFlush makes as much sense. I think his perspective may be that the C stdio library appears to recognize that a TTY read is an occasion to flush stdout. If you already know better, there are ways to solve this problem - flush, change buffering, or use stderr instead of stdout. If you weren't expecting it, though, Haskell might by comparison seem a little retarded. Maybe it isn't a bug.
As things stand, it is quite a hassle to use a Haskell library of any complexity called from C.
And Haskell is slightly handicapped in this application, if it still doesn't support any good mechanism for top level mutable state (Cf. http://www.haskell.org/haskellwiki/Top_level_mutable_state ) I wonder if the fact that we recognize these problems but haven't been super-motivated to solve them, suggests that there hasn't really been that much call for stand alone Haskell libraries? (The proposal seems to say that C++ programmers could jump at this opportunity because they've been wanting to use Haskell but were afraid of the IO Monad?) Donn

On Apr 5, 2012, at 5:08 PM, Donn Cave wrote:
As things stand, it is quite a hassle to use a Haskell library of any complexity called from C.
[…]
I wonder if the fact that we recognize these problems but haven't been super-motivated to solve them, suggests that there hasn't really been that much call for stand alone Haskell libraries?
Good question. Since there are not so many complete GUI toolkits yet, I can imagine that someone would write a GUI in C++ or Objective-C and use a Haskell library from there. Other than that, I think most people prefer writing an application in Haskell as much as possible to exploit its benefits over C/C++. -- Daniël
participants (7)
-
Anthony Cowley
-
Brandon Allbery
-
Daniël de Kok
-
Donn Cave
-
Evan Laforge
-
Holger Siegel
-
Sutherland, Julian