FFI Bindings to Libraries using GMP

Hello, I've been struggling using FFI bindings to libraries which rely on the GNU Mp Bignum library (gmp) - this is apparently a well known problem (http://hackage.haskell.org/trac/ghc/ticket/311, http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes). I do rely on using such libraries, and so started to get it work on MacOsX; no solution was close to being satisfactory however, so I'd like to ask for some advice. Those two options worked to some extend: (1) Create or modify the library in question, so gmp is statically linked and its symbols are hidden. When source code is available, this is relatively easy, altough it requires modification of the build process (which can be a hassle). If a static "ar" archive is available, it is cumbersome (at least on Mac Os X I ran into a lot of troubles using nmedit), but possible. Furthermore, the resulting libraries are bloated, as each of them contains a copy of the GMP; left alone portability issues. (2) As suggested in ticket#311, I tried switching the allocator functions when switching to FFI. It worked, but not in GHCi; also, doing it manually is a lot of work, because, as far as I could figure out, it is neccessary to write a wrapper function for every C-function (indirectly) using the gmp. Furthermore, most of the libraries expose gmp datatypes (mpz_t,mpq_t) in their API. I currently use a little haskell module working on GHC.Exts, but that's propably not a good option from a maintainer's point of view. There are certainly other possibilities, but I couldn't find one which is both maintainable and portable. I would be very grateful for any advice, or some information on plans for resolving ticket #311. There are some great libraries (like the Parma Polyhedral Library, to pick an example) out there using gmp, and it would be nice if writing bindings to those libs could be simplified. Thanks, Benedikt Huber

| I've been struggling using FFI bindings to libraries which rely on the | GNU Mp Bignum library (gmp) - this is apparently a well known problem | (http://hackage.haskell.org/trac/ghc/ticket/311, | http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes). | I do rely on using such libraries, and so started to get it work on | MacOsX; no solution was close to being satisfactory however, so I'd like | to ask for some advice. It's an issue that bites very few users, but it bites them hard. It's also tricky, but not impossible, to fix. The combination keeps meaning that at GHC HQ we work on things that affect more people. I doubt we can spare effort to design and implement a fix in the near future -- we keep hoping someone else step up and tackle it! Peter Tanski did exactly that (he's the author of the ReplacingGMPNotes above), but he's been very quiet recently. I don't know where he is up to. Perhaps someone else would like to join in? Meanwhile I've added your comments to #311 so they stay with the ticket. Simon | | Those two options worked to some extend: | (1) | Create or modify the library in question, so gmp is statically linked | and its symbols are hidden. | When source code is available, this is relatively easy, altough it | requires modification of the build process (which can be a hassle). If a | static "ar" archive is available, it is cumbersome (at least on Mac Os X | I ran into a lot of troubles using nmedit), but possible. Furthermore, | the resulting libraries are bloated, as each of them contains a copy of | the GMP; left alone portability issues. | (2) | As suggested in ticket#311, I tried switching the allocator functions | when switching to FFI. It worked, but not in GHCi; also, doing it | manually is a lot of work, because, as far as I could figure out, it is | neccessary to write a wrapper function for every C-function (indirectly) | using the gmp. | | Furthermore, most of the libraries expose gmp datatypes (mpz_t,mpq_t) in | their API. I currently use a little haskell module working on | GHC.Exts, but that's propably not a good option from a maintainer's | point of view. There are certainly other possibilities, but I couldn't | find one which is both maintainable and portable. | | I would be very grateful for any advice, or some information on plans | for resolving ticket #311. There are some great libraries (like the | Parma Polyhedral Library, to pick an example) out there using gmp, and | it would be nice if writing bindings to those libs could be simplified. | | Thanks, | Benedikt Huber | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Simon Peyton-Jones
Peter Tanski did exactly that (he's the author of the ReplacingGMPNotes above), but he's been very quiet recently. I don't know where he is up to. Perhaps someone else would like to join in?
Since I live only five minutes away from this year's Hackathon venue, I guess it would be kind of silly to have such an event in my neighborhood and not go to it! :) I would enjoy working on this during the hackathon. Is there a reason to not start with a simple, Haskell-only implementation? And later use techniques known from ByteStrings and stream fusion to improve on that. -Matthias -- Matthias Neubauer | Universität Freiburg, Institut für Informatik | tel +49 761 203 8060 Georges-Köhler-Allee 79, 79110 Freiburg i. Br., Germany | fax +49 761 203 8052

neubauer:
Simon Peyton-Jones
writes: Peter Tanski did exactly that (he's the author of the ReplacingGMPNotes above), but he's been very quiet recently. I don't know where he is up to. Perhaps someone else would like to join in?
Since I live only five minutes away from this year's Hackathon venue, I guess it would be kind of silly to have such an event in my neighborhood and not go to it! :)
I would enjoy working on this during the hackathon. Is there a reason to not start with a simple, Haskell-only implementation? And later use techniques known from ByteStrings and stream fusion to improve on that.
-Matthias
Oh, Matthias: if you're coming to the hackathon, please register so we can get you a tshirt! :) Details on the hackathon page. http://www.haskell.org/haskellwiki/Hac_2007_II -- Don

I won't be at the hackathon I fear (and neither will Simon M) but I jotted down some notes about how to replace GMP with a Haskell library
http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/HaskellLibrary
Simon
| -----Original Message-----
| From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On
| Behalf Of Matthias Neubauer
| Sent: 11 September 2007 22:04
| To: Simon Peyton-Jones
| Cc: glasgow-haskell-users@haskell.org; benedikth; Peter Tanski
| Subject: Re: FFI Bindings to Libraries using GMP
|
| Simon Peyton-Jones

| I've been struggling using FFI bindings to libraries which rely on the | GNU Mp Bignum library (gmp). It's an issue that bites very few users, but it bites them hard. It's also tricky, but not impossible, to fix. The combination keeps meaning that at GHC HQ we work on things that affect more people. I doubt we can spare effort to design and implement a fix in the near future -- we keep hoping someone else step up and tackle it!
Peter Tanski did exactly that (he's the author of the ReplacingGMPNotes above), but he's been very quiet recently. I don't know where he is up to. Perhaps someone else would like to join in?
test k = (iterateT k (fromIntegral (maxBound ::Int))) :: Integer where iterateT 0 v = v; iterateT k v = v `seq` iterateT (k-1) (v+10000)
malloc(3), mpz_init_set(gmp), mpz_add_ui(gmp), mpz_clear(gmp) and free(3), takes more than 2 times as long, with 25% of the time spend in allocating and freeing pointers to gmp integers (mpz_ptr) and 50% of
Thank you for the information - I'm also willing to help, though I'm not too familiar with the GHC internals (yet). I do like the idea of optionally linking with a pure-haskell library, but I'm interested in a solution with comparable performance. Commenting solutions to ticket #311: (1) Creating a custom variant of the gmp lib by renaming symbols and possibly removing unneccessary functionality, as suggest by Simon Marlow in ticket #311 would be relatively straightforward; I've already tried this approach the other way round (i.e. recompile libraries to be used with the FFI). But it means that you'd have to maintain and ship another library, so I guess it is not an option for the GHC team. (2) Using the standard allocation functions for the gmp memory managment (maybe as compile flag) as suggested in http:// www.haskell.org/pipermail/glasgow-haskell-users/2006-July/010660.html would also resolve ticket #311. In this case at least the dynamic part of gmp integers has to be resized using external allocation functions, and a finalizer (mpz_clear) has to be called when an Integer is garbage collected. It seems that the performance loss by using malloc is significant [1], as lots of allocations and reallocations of very small chunks occur in a functional setting; some kind of (non garbage collected !) memory pool allocation would certainly help. I'm not sure what overhead is associated with calling a finalizer ? (3) So when replacing GMP with the BN library of OpenSSL (see http:// hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/ PerformanceMeasurements), it would propably be neccessary to refactor the library, so custom allocation can be used as well. This does not seem too difficult at a first glance though. So I'd like to investigate the second or third option, as far as my knowledge and time permits it. Of course it would be wise to check first if Peter Tanski is already/ still working on a GMP replacement. Benedikt [1] Simple Performance Test on (ghc-darwin-i386-6.6.1): The haskell function (k was taken as 10M) triggers around k allocations and k reallocations by the gmp library. The rough C equivalent, calling sequences of the time spend in gmp allocator functions (i.e. resizing gmp integers = (re)allocating limbs). I also performed the test with the datatype suggested by John Meacham (using a gmp library with renamed symbols),
data FInteger = FInteger Int# (!ForeignPtr Mpz) but it was around 8x slower, maybe due to the ForeignPtr and FFI overhead, or due to missing optimizations in the code.

Benedikt Huber wrote:
(3) So when replacing GMP with the BN library of OpenSSL
Maybe the following license issues with OpenSSL should be taken into account: http://www.gnome.org/~markmc/openssl-and-the-gpl.html Regards, Zun.

On Sep 14, 2007, at 9:14 AM, Benedikt Huber wrote:
| I've been struggling using FFI bindings to libraries which rely on the | GNU Mp Bignum library (gmp). It's an issue that bites very few users, but it bites them hard. It's also tricky, but not impossible, to fix. The combination keeps meaning that at GHC HQ we work on things that affect more people. I doubt we can spare effort to design and implement a fix in the near future -- we keep hoping someone else step up and tackle it!
Peter Tanski did exactly that (he's the author of the ReplacingGMPNotes above), but he's been very quiet recently. I don't know where he is up to. Perhaps someone else would like to join in?
I apologise for being away. The company I work for has been ramping up for a launch and I have been working very long hours (nights and weekends, too).
Thank you for the information - I'm also willing to help, though I'm not too familiar with the GHC internals (yet). I do like the idea of optionally linking with a pure-haskell library, but I'm interested in a solution with comparable performance. Commenting solutions to ticket #311:
It goes beyond mere familiarity with the internals: the GMP functions are threaded throughout the RTS and the PrimOps files. Of all the primitive operations, they are the most ubiquitous for interfering in other code. The rough list I put on the ReplacingGMP page is a start but the more I work with the RTS the more little things keep turning up. At the least I should update the page.
(2) Using the standard allocation functions for the gmp memory managment (maybe as compile flag) as suggested in http:// www.haskell.org/pipermail/glasgow-haskell-users/2006-July/ 010660.html would also resolve ticket #311. In this case at least the dynamic part of gmp integers has to be resized using external allocation functions, and a finalizer (mpz_clear) has to be called when an Integer is garbage collected. It seems that the performance loss by using malloc is significant [1], as lots of allocations and reallocations of very small chunks occur in a functional setting; some kind of (non garbage collected !) memory pool allocation would certainly help. I'm not sure what overhead is associated with calling a finalizer ?
The problem of lots of small allocations affects the garbage collector, as well. In the current implementation, each GMP operation calls doYouWantToGC()--I'm sure you have seen the note in PrimOps.cmm, for example--which may act as a stop-world garbage collection. The byte arrays for GMP are also pinned. Compared to this, a FFI implementation using finalizers, which have horrible but practical guarantees on when they are called, would work much better. The best solution would be to revamp the way Integer types are implemented, so when possible they are mutable under the hood, much like using the binary += instead of the ternary +. Enumerations like the test in [1], below, would not be mutable unless there were some information such as a good consumer function that indicated the intermediate values were only temporarily necessary.
(3) So when replacing GMP with the BN library of OpenSSL (see http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/ PerformanceMeasurements), it would propably be neccessary to refactor the library, so custom allocation can be used as well. This does not seem too difficult at a first glance though.
The OpenSSL library is not GPL compatible, so there would be licensing problems for GPL'd system distributions; it is also relatively slow, though it does have a noticeably constant curve for exponential functions. The one problem you will find with _all_ potential replacement libraries is incompatible behaviour for bitwise functions: they are implemented arithmetically in GMP but logically elsewhere (when they are implemented at all). (Note: if you are looking for the left-shift and right-shift operations in GMP, they are hidden in mpz_mul_2exp and mpz_t_div_q_2exp.) LibTomMath, for example uses pure logical shifts which do not produce correct results. I could go on about many other small differences but the end result is that you would have to do a lot of hacking to get a library that would replace all the functionality GMP provides. That is why I started a replacement from scratch.
So I'd like to investigate the second or third option, as far as my knowledge and time permits it. Of course it would be wise to check first if Peter Tanski is already/still working on a GMP replacement.
I left off working on it for some time, but things may slow down a little for now so I will (hopefully) have time to package it up. I meant to do that more than a month ago for Thorkil, who has written a multi-precision integer library before and wanted to help.
[1] Simple Performance Test on (ghc-darwin-i386-6.6.1):
test k = (iterateT k (fromIntegral (maxBound ::Int))) :: Integer where iterateT 0 v = v; iterateT k v = v `seq` iterateT (k-1) (v+10000)
The haskell function (k was taken as 10M) triggers around k allocations and k reallocations by the gmp library.
The rough C equivalent, calling sequences of
malloc(3), mpz_init_set(gmp), mpz_add_ui(gmp), mpz_clear(gmp) and free(3), takes more than 2 times as long, with 25% of the time spend in allocating and freeing pointers to gmp integers (mpz_ptr) and 50% of the time spend in gmp allocator functions (i.e. resizing gmp integers = (re)allocating limbs).
Malloc is fast but not nearly as fast as the RTS alloc functions; one thing I have not started is integrating the replacement library with GHC, mostly because the replacement library (on par or faster than GMP) uses SIMD functions whenever possible and they require proper alignment.
I also performed the test with the datatype suggested by John Meacham (using a gmp library with renamed symbols),
data FInteger = FInteger Int# (!ForeignPtr Mpz) but it was around 8x slower, maybe due to the ForeignPtr and FFI overhead, or due to missing optimizations in the code.
That is quite an interesting result. Are these "safe" foreign imports? Cheers, Pete

Peter Tanski wrote:
The one problem you will find with _all_ potential replacement libraries is incompatible behaviour for bitwise functions: they are implemented arithmetically in GMP but logically elsewhere (when they are implemented at all).
I don't fully understand this... I made sure my Haskell implementation's "Bits" was consistent with the existing (GMP) version, which you call "arithmetically"... it seemed that any other semantics would reveal (depend on) implementation details. and I was able to make it reasonably asymptotically-efficient anyway. Isaac

Am 18.09.2007 um 05:49 schrieb Peter Tanski:
The best solution would be to revamp the way Integer types are implemented, so when possible they are mutable under the hood, much like using the binary += instead of the ternary +. Enumerations like the test in [1], below, would not be mutable unless there were some information such as a good consumer function that indicated the intermediate values were only temporarily necessary. I'm not sure if I understand this correctly; Do you want to expose an unsafe/IO interface for destructive Integer manipulation ?
The OpenSSL library is not GPL compatible, so there would be licensing problems for GPL'd system distributions; it is also relatively slow, though it does have a noticeably constant curve for exponential functions. Maybe you should add a note to http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/ PerformanceMeasurements. The statistics suggest that the OpenSSL BN has comparable performance to the GMP, especially for smaller numbers.
Some note about the (very confusing) licensing issues regarding OpenSSL would also be nice.
[1] Simple Performance Test on (ghc-darwin-i386-6.6.1): Malloc is fast but not nearly as fast as the RTS alloc functions; one thing I have not started is integrating the replacement library with GHC, mostly because the replacement library (on par or faster than GMP) uses SIMD functions whenever possible and they require proper alignment.
Ok, it's good to know you're already working on integrating a (native) replacement library.
I also performed the test with the datatype suggested by John Meacham (using a gmp library with renamed symbols),
data FInteger = FInteger Int# (!ForeignPtr Mpz) but it was around 8x slower, maybe due to the ForeignPtr and FFI overhead, or due to missing optimizations in the code.
That is quite an interesting result. Are these "safe" foreign imports? No. Note that `FInteger' above is even faster than the build-in Integer type for small integers (Ints), so I was talking about allocation of gmp integers. I elaborated the test a little, it now shows consistent results I think [1a]; a lot of performance is lost when doing many allocations using malloc, and even more invoking ForeignPtr finalizers.
I'm still interested in sensible solutions to Bug #311, and maybe nevertheless simple switching to standard gmp allocation (either with finalizers or copying limbs when entering/leaving the gmp) via a compile flag would be the right thing for many applications. I'm also looking forward to see the results of the replacement library you're trying to integrate, and those of haskell Integer implementations. regards, benedikt [1a] Integer Allocation Test
allocTest :: Int -> `some Integral Type T' allocTest iterations = (iterateT iterations INIT) where iterateT 0 v = v iterateT k v = v `seq` iterateT (k-1) (v+STEP)
- Small BigNums Allocation Test (INIT = 2^31, STEP = 10^5, k=10^6) Results (utime samples.sort[3..7].average) on darwin-i386 (dualcore): 0.04s destructive-update C implementation 0.19s with T = Integer 0.71s non-destructive-update C implementation using malloc with T = FInteger {-# UNPACK -#} !Int# {-# UNPACK -#} ! (ForeignPtr Mpz) 0.90s with newForeignPtr_ (no finalizer, space leak) 1.87s with Foreign.Concurrent.newForeignPtr mpz do { hs_mpz_clear mpz; free mpz} 1.94s with newForeignPtr free_mpz_c_impl mpz - Small Integers Allocation Test (INIT=0,STEP=4,k=2*10^8) Results (utime samples.sort[3..7].average) on darwin-i386 (dualcore): 0.67s for Int 2.54s for FInteger 3.62s for Integer

Hello Benedikt, I apologise for the late reply. I am travelling tomorrow but I will try to get something an alpha implementation out by this Wednesday. For now here are some preliminary answers: On Sep 28, 2007, at 7:41 AM, Benedikt Huber wrote:
Am 18.09.2007 um 05:49 schrieb Peter Tanski:
The best solution would be to revamp the way Integer types are implemented, so when possible they are mutable under the hood, much like using the binary += instead of the ternary +. Enumerations like the test in [1], below, would not be mutable unless there were some information such as a good consumer function that indicated the intermediate values were only temporarily necessary. I'm not sure if I understand this correctly; Do you want to expose an unsafe/IO interface for destructive Integer manipulation?
I would not expose it, just optimise it, in the same way as we can replace recursion with loops at the Cmm level. The end result would involve re-cycling integer memory so you might say that in some equations integers are mutable. (If it is provable that an integer value would not be used again, then it does not seem right not to recycle the memory.)
The OpenSSL library is not GPL compatible, so there would be licensing problems for GPL'd system distributions; it is also relatively slow, though it does have a noticeably constant curve for exponential functions. Maybe you should add a note to http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/ PerformanceMeasurements. The statistics suggest that the OpenSSL BN has comparable performance to the GMP, especially for smaller numbers.
Some note about the (very confusing) licensing issues regarding OpenSSL would also be nice.
I will add this to the wiki. In short, paragraph 10 of the GPL and paragraph 11 of the LGPL--here I may have the paragraphs wrong-- prohibit placing any additional restrictions on your licensees. OpenSSL places an additional restriction on licensees: you cannot use the name 'OpenSSL' with regard to your product, so the OpenSSL license is incompatible with the GPL/LGPL.
[1] Simple Performance Test on (ghc-darwin-i386-6.6.1): Malloc is fast but not nearly as fast as the RTS alloc functions; one thing I have not started is integrating the replacement library with GHC, mostly because the replacement library (on par or faster than GMP) uses SIMD functions whenever possible and they require proper alignment.
Ok, it's good to know you're already working on integrating a (native) replacement library.
It's workable for now but I need to finish Toom3, a basic FFT, and some specialised division operations. I also need to give Thorkil Naur a crack at it. All of this has been on hold because I have been too selfish and perfectionistic to give anyone what I consider a mess and I have been working too many hours to fix it. (This seems to be a common problem of mine; I intend to change that.)
I also performed the test with the datatype suggested by John Meacham (using a gmp library with renamed symbols),
data FInteger = FInteger Int# (!ForeignPtr Mpz) but it was around 8x slower, maybe due to the ForeignPtr and FFI overhead, or due to missing optimizations in the code.
That is quite an interesting result. Are these "safe" foreign imports? No. Note that `FInteger' above is even faster than the build-in Integer type for small integers (Ints), so I was talking about allocation of gmp integers. I elaborated the test a little, it now shows consistent results I think [1a]; a lot of performance is lost when doing many allocations using malloc, and even more invoking ForeignPtr finalizers.
I found the same thing when I tried that; malloc is slow compared to GC-based alloc. The ForeignPtr finalizers do not always run since-- as far as I know--they are only guaranteed to run before RTS shutdown.
I'm still interested in sensible solutions to Bug #311, and maybe nevertheless simple switching to standard gmp allocation (either with finalizers or copying limbs when entering/leaving the gmp) via a compile flag would be the right thing for many applications. I'm also looking forward to see the results of the replacement library you're trying to integrate, and those of haskell Integer implementations.
The fastest interim solution I can come up with for you would be to use Isaac Dupree's Haskell-based integer library and set up preprocessor defines so you could build ghc (HEAD) from source and use that. Would that be sufficient for now? Cheers, Pete
[1a] Integer Allocation Test
allocTest :: Int -> `some Integral Type T' allocTest iterations = (iterateT iterations INIT) where iterateT 0 v = v iterateT k v = v `seq` iterateT (k-1) (v+STEP)
- Small BigNums Allocation Test (INIT = 2^31, STEP = 10^5, k=10^6) Results (utime samples.sort[3..7].average) on darwin-i386 (dualcore): 0.04s destructive-update C implementation 0.19s with T = Integer 0.71s non-destructive-update C implementation using malloc with T = FInteger {-# UNPACK -#} !Int# {-# UNPACK -#} ! (ForeignPtr Mpz) 0.90s with newForeignPtr_ (no finalizer, space leak) 1.87s with Foreign.Concurrent.newForeignPtr mpz do { hs_mpz_clear mpz; free mpz} 1.94s with newForeignPtr free_mpz_c_impl mpz
- Small Integers Allocation Test (INIT=0,STEP=4,k=2*10^8) Results (utime samples.sort[3..7].average) on darwin-i386 (dualcore): 0.67s for Int 2.54s for FInteger 3.62s for Integer
participants (8)
-
Benedikt Huber
-
benedikth
-
Don Stewart
-
Isaac Dupree
-
Matthias Neubauer
-
Peter Tanski
-
Roberto Zunino
-
Simon Peyton-Jones