
On Sep 14, 2007, at 9:14 AM, Benedikt Huber wrote:
| I've been struggling using FFI bindings to libraries which rely on the | GNU Mp Bignum library (gmp). It's an issue that bites very few users, but it bites them hard. It's also tricky, but not impossible, to fix. The combination keeps meaning that at GHC HQ we work on things that affect more people. I doubt we can spare effort to design and implement a fix in the near future -- we keep hoping someone else step up and tackle it!
Peter Tanski did exactly that (he's the author of the ReplacingGMPNotes above), but he's been very quiet recently. I don't know where he is up to. Perhaps someone else would like to join in?
I apologise for being away. The company I work for has been ramping up for a launch and I have been working very long hours (nights and weekends, too).
Thank you for the information - I'm also willing to help, though I'm not too familiar with the GHC internals (yet). I do like the idea of optionally linking with a pure-haskell library, but I'm interested in a solution with comparable performance. Commenting solutions to ticket #311:
It goes beyond mere familiarity with the internals: the GMP functions are threaded throughout the RTS and the PrimOps files. Of all the primitive operations, they are the most ubiquitous for interfering in other code. The rough list I put on the ReplacingGMP page is a start but the more I work with the RTS the more little things keep turning up. At the least I should update the page.
(2) Using the standard allocation functions for the gmp memory managment (maybe as compile flag) as suggested in http:// www.haskell.org/pipermail/glasgow-haskell-users/2006-July/ 010660.html would also resolve ticket #311. In this case at least the dynamic part of gmp integers has to be resized using external allocation functions, and a finalizer (mpz_clear) has to be called when an Integer is garbage collected. It seems that the performance loss by using malloc is significant [1], as lots of allocations and reallocations of very small chunks occur in a functional setting; some kind of (non garbage collected !) memory pool allocation would certainly help. I'm not sure what overhead is associated with calling a finalizer ?
The problem of lots of small allocations affects the garbage collector, as well. In the current implementation, each GMP operation calls doYouWantToGC()--I'm sure you have seen the note in PrimOps.cmm, for example--which may act as a stop-world garbage collection. The byte arrays for GMP are also pinned. Compared to this, a FFI implementation using finalizers, which have horrible but practical guarantees on when they are called, would work much better. The best solution would be to revamp the way Integer types are implemented, so when possible they are mutable under the hood, much like using the binary += instead of the ternary +. Enumerations like the test in [1], below, would not be mutable unless there were some information such as a good consumer function that indicated the intermediate values were only temporarily necessary.
(3) So when replacing GMP with the BN library of OpenSSL (see http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/ PerformanceMeasurements), it would propably be neccessary to refactor the library, so custom allocation can be used as well. This does not seem too difficult at a first glance though.
The OpenSSL library is not GPL compatible, so there would be licensing problems for GPL'd system distributions; it is also relatively slow, though it does have a noticeably constant curve for exponential functions. The one problem you will find with _all_ potential replacement libraries is incompatible behaviour for bitwise functions: they are implemented arithmetically in GMP but logically elsewhere (when they are implemented at all). (Note: if you are looking for the left-shift and right-shift operations in GMP, they are hidden in mpz_mul_2exp and mpz_t_div_q_2exp.) LibTomMath, for example uses pure logical shifts which do not produce correct results. I could go on about many other small differences but the end result is that you would have to do a lot of hacking to get a library that would replace all the functionality GMP provides. That is why I started a replacement from scratch.
So I'd like to investigate the second or third option, as far as my knowledge and time permits it. Of course it would be wise to check first if Peter Tanski is already/still working on a GMP replacement.
I left off working on it for some time, but things may slow down a little for now so I will (hopefully) have time to package it up. I meant to do that more than a month ago for Thorkil, who has written a multi-precision integer library before and wanted to help.
[1] Simple Performance Test on (ghc-darwin-i386-6.6.1):
test k = (iterateT k (fromIntegral (maxBound ::Int))) :: Integer where iterateT 0 v = v; iterateT k v = v `seq` iterateT (k-1) (v+10000)
The haskell function (k was taken as 10M) triggers around k allocations and k reallocations by the gmp library.
The rough C equivalent, calling sequences of
malloc(3), mpz_init_set(gmp), mpz_add_ui(gmp), mpz_clear(gmp) and free(3), takes more than 2 times as long, with 25% of the time spend in allocating and freeing pointers to gmp integers (mpz_ptr) and 50% of the time spend in gmp allocator functions (i.e. resizing gmp integers = (re)allocating limbs).
Malloc is fast but not nearly as fast as the RTS alloc functions; one thing I have not started is integrating the replacement library with GHC, mostly because the replacement library (on par or faster than GMP) uses SIMD functions whenever possible and they require proper alignment.
I also performed the test with the datatype suggested by John Meacham (using a gmp library with renamed symbols),
data FInteger = FInteger Int# (!ForeignPtr Mpz) but it was around 8x slower, maybe due to the ForeignPtr and FFI overhead, or due to missing optimizations in the code.
That is quite an interesting result. Are these "safe" foreign imports? Cheers, Pete