Haskell integration with C/C++ (GSOC)

5 Apr 2012

      Hey Guys,

I'm Julian, I am reaching the end of my second year as a JMC (Joint Mathematics and Computer science) Student at Imperial College London
and I'd like to apply to GSOC for a project involving Haskell and I just wanted to run my idea past the community.

I've already talked about this on the haskell-soc IRC channel (I go by the pseudonym "julek").

What I would like to do is to improve the integration of C/C++ with Haskell, particularly in calling Haskell from C/C++.

Currently ghc is able to generate stubs to export functions whose arguments are simple types such as CInts into C/C++.
The stub generated is always in an extern "C" clause due to the fact that ghc does not as yet implement the C++
calling conventions as defined in the "The Haskell 98 Foreign Function Interface 1.0" (http://www.cse.unsw.edu.au/~chak/haskell/ffi/ffi.pdf)

So a first step would be to implement this calling convention to bring it up to speed with the above referenced report.
This shouldn't be too hard and mostly involves implementing C++ name mangling conventions.

Next, I would like to extend the stub generation so as to be able to deal with more complex types.

The type systems in C++ and Haskell have many analogous syntaxes that can be easily exploited to provide strong compatibility and interoperability between the two languages.

For example a simple type such as:

data Foo = A | B

Could be implemented as an enum in C/C++:

enum Foo {A, B};

More advanced types that take arguments such as:

data Tree = Node Left Right | Leaf

Could be converted to a struct in C/C++:

struct Tree {
    struct Tree* left;
    struct Tree* right;
};

Types that have functions that act on them such as:

data IntContainer = IntContainer Int

getInt :: IntContainer -> Int
getInt (IntContainer a) = a

could have these functions automatically converted to C/C++:

struct IntContainer {
    int a;
};

extern int getInt_hs(IntContainer a);

This also opens up the possibility of exploiting C/C++ name mangling conventions, to allow the _hs postfix I'm suggesting here to be eliminated.

Haskell classes:

class Arithmetic a where
    (+) :: a -> a -> a
    (*) :: a -> a -> a
    (-) :: a -> a -> a
    (/) :: a -> a -> a

could be implemented using C++ functions with virtual members:

class Monad {
    public:
       virtual Monad add(Monad a, Monad b);
       virtual Monad mult(Monad a, Monad b)
       virtual Monad neg(Monad a, Monad b);
       virtual Monad div(Monad a, Monad b);
}

All types of single/multiple instancing (i.e. either directly or through requirements of instances)
would be implemented using single/multiple inheritance.

Obviously, this example is rather contrived due to the conversion of the function names.
The fact that the rules that govern function naming in Haskell are much more permissive
than those of C/C++ might cause compatibility issues.

This can be worked around by implementing a similar syntax to that currently used for function imports by the FFI.
E.g..:
    foreign export ccall "bind" >>= :: CInt -> CInt

Similar to:
    foreign import ccall "f" func :: CInt -> CInt

The latter is the current syntax for imports.

The name given for the export would be checked for legality in the namespace of the target language.

Alternatively this could be done in an automated manner using some naming conventions as well as operator polymorphism, but this would probably sacrifice ease of use.

Finally polymorphic Haskell functions/types can be implemented in C++ using templates.

I would like to extend ghc to implement enhanced C/C++ stub generation using the methods described above as well as to generate Haskell stubs which describe the Haskell CType equivalents of the Haskell types defined, functions for conversion between the two and function stubs to convert the types, run the Haskell function and convert back as required.

On top of this I'd like to write C/C++ libraries for dealing with most of the standard Haskell types such as Maybe, Either, etc...

Finally, I'd like to work on ironing out any bugs that remain in the RTS when it is used in "multilingual" situations, as well as improving it's performance in this situation.

I found an example of such a bug, which I will test further before reporting it.
It seems to be the opposite of the following bug:
    http://hackage.haskell.org/trac/ghc/ticket/5594

i.e. the stdout buffer isn't always correctly flushed when calling C/C++ in a program whose main is written in Haskell.

For example, when running the code:

main.hs:
    module Main where

    import Foreign.C.Types
    import System.IO

    foreign import ccall "inc" c_inc :: CInt -> CInt

    main :: IO ()
    main = do
      putStr "Enter n: "
      -- hFlush stdout
      s <-getLine
      putStrLn . show . c_inc . read $ s

inc.c:

    int inc(int i) __attribute__ ((const));

    int inc(int i)
    {
      return i + 1;
    }

Built with
Makefile:
    all:
        gcc -c -o inc.o inc.c
        ghc --make -main-is Main main.hs inc.o -o test
        rm *.hi *.o

The output comes out as:
    [julek@cryptid inc]$ ./test
    2
    Enter n: 3

But when the " hFlush stdout" line is commented back in, the output is:
    [julek@cryptid inc]$ ./test
    Enter n: 2
    3

which is correct, but the extra line shouldn't be necessary.

I am currently using ghc 7.4.1 which is the newest version, so this is a current bug.

I have had a look for such a bug being reported and have found no such report.
I'll look into this further before reporting it, but I am fairly certain this is a bug in the RTS.

As part of this project, I would fix this bug (if it is still around when I start) as well as looking for other ones in this area.

I think that extending ghc to the level required by "The Haskell 98 Foreign Function Interface 1.0" specification and above would reap significant benefit to the Haskell community.

The improved integration into C/C++ would open the door for this to happen for several other languages and would make Haskell more widespread.

Many Haskell beginners are daunted by the falsely perceived complexity of working with Haskell IO and monads, but love using the massive advantages that the paradigm gives in a non monadic context. Due to this, simplifying the interoperability between Haskell and C/C++ would enable many of these users to stick around for longer and perhaps encourage them to eventually look deeper into the language. This would make the size of the community grow and make the use of Haskell more widespread, potentially reaping benefits for the community at large.

I believe this could be implemented within the time frame given for GSOC.

Hope you like the ideas presented here and hopefully I'll be accepted into the Haskell summer of code!

If anybody has any opinions on the implementability/usefulness of this and/or criticism of the idea, please inform me and I'll be happy to discuss it!

Looking forward to hearing from you!

Thanking you in advance.

Kind regards,

Julian Sutherland

Sutherland, Julian

Evan Laforge

Brandon Allbery

Holger Siegel

Brandon Allbery

Anthony Cowley

Donn Cave

Daniël de Kok

tags

participants (7)