Re: Replacement for GMP: Update

24 Aug 2006

      Hello Peter,

Sorry for the late reply. From your latest communication which seems to be

Date: Sat, 12 Aug 2006 21:12:05 -0400
From: Peter Tanski 
Subject: Re: OpenSSL License (was Replacement for GMP: Update)
To: John Goerzen 

I am a bit uncertain where the matter stands right now. Nevertheless, I wish 
to thank you for your reply. Clearly, you have done a lot of work, not least 
to investigate alternatives.

You write
...
I hope I have answered a few of your points; some of them have given  
me more work but that is all right, I guess.  ...
And you most certainly have, much more detailed than I could have hoped for. 
And hopefully, this additional work that you mention is not something that 
you considered a waste of time.

I am also most happy to read that
...
Before I actually bind any library with GHC, I will thoroughly test  
it as a stand-alone library   ...
since "correctness" of Integer computation is by far its most important 
property. (I would say that "performance" comes second, then what could be 
termed "interoperability" as third.)

The support for Integer computation is a messy subject: On the one hand, even 
children out of the first few years of school are capable of understanding 
the basic functionality involved. On the other hand, providing efficient 
support of this functionality, especially if required in a wide selection of 
circumstances, requires a surprising amount of hard work and insight. I would 
say, if only really large integers were routinely required in actual, 
real-world computations, our CPUs would support these computations and there 
would be no need to consider the question in the, admittedly, fairly limited 
context of GHC. The situation seems to be that the functionality is perhaps 
considered obviously available, but the reality is that it can only be 
attained at significant cost (either real money or honest sweat).

I am truly unable to tell what I would consider the ideal situation. On the 
one hand, I value greatly the freedom of choosing my circumstances without 
restraints. And I appreciate the desire of noble people like the GHC 
developers to be equally free of such restraints. This would point in the 
direction of basing Integer computations on a fairly generic implementation. 
Which insightful consideration would then force us to admit would probably 
cost a factor of, say, 2 or more in performance in some cases.

On the other hand, the existence of libraries like GMP seems to offer an 
economic path out of this jungle. However, as the discussion over the past 
couple of weeks illustrates, the path may be unacceptably thorny.

Let me quote some additional statements from this debate:

On Thursday 10 August 2006 19:00, Simon Peyton-Jones wrote:
...
...
OK so you are talking of taking a copy of the BN code, changing the
function names, and statically linking its allocator to GHC's RTS.
If someone wants to make a program that uses BN for some other purpose,
they get the original BN, whose function names don't clash, and which
uses malloc.
Sounds ok to me.
Simon
...
Two problems seem to be lurking here: The first is that, although taking a 
copy of some existing library and massaging it to become working under some 
presently available circumstances may be fine, what is really needed is 
something that keeps on working, even under changing and future 
circumstances. The second is that if I wish to mix Haskell and non-Haskell 
code that processes the same data, I may find myself in need of a way to 
convert between whatever is required by this copy of some existing library 
that has been created and the presently available, but potentially quite 
different, version of that same library.

No offence is meant here: I know that I am simply re-iterating what I have 
said earlier. However, I believe that this is sufficiently important to risk 
being ridiculed for pointing out the obvious.

Then you ask
...
... If you have the time, would you  
be able to find any programs you might have that displayed poor  
Integer performance or possibly bottlenecks in the current system?
Let me say first that I have not experienced any really surprisingly poor 
performance of the current system. To be sure, I have seen programs similar 
to my own and not written in Haskell that performed better on comparable 
tasks. But I have not been able to say specifically that this poorer 
performance of my program was caused by some inadequacy in the Haskell (with 
GMP) implementtion.

I better be specific here: What I do is integer factorization (splitting 
integers into their prime factors). And, for example, I have implemented a 
version of the ECM (Elliptic Curve Method) in Haskell. Having done that 
(independently and mainly to learn what it was about), I eventually compared 
it in performance to GMP-ECM which Paul Zimmermann wrote. And I was initially 
disappointed to find that GMP-ECM was about 7 times faster than my ECM code. 
Fortunately, various sessions of "hand-waving" brought this factor down to 
about 1.5. So, whenever GMP-ECM used 2 minutes, my program would use 3 
minutes.

I must admit that I am neither satisfied nor done with this result. At the 
outset, when I started using Haskell for these number theoretic computations, 
I had hoped that the overhead of using Haskell, compared to the use of C with 
GMP calls, would be insignificant, something like 10 or at worst 20 percent. 
So this 50 percent of overhead seems a bit excessive. On the other hand, 
there may be additional hands to wave that I haven't thought of.

And again: I am not done working with this, just holding an extended break. 
Please indicate if you wish to see some detailed and hard results.

Some additional experience is this: To gain some potentially easier-to-compare 
measurements between Haskell-with-GMP and C-with-GMP, I implemented the GMP 
function mpz_powm (raising a number to a power modulo a third number) as a 
primitive operation in GHC. And compared it to my own power-raising operation 
that was implemented in Haskell.

Overall, the GMP implementation of the power operation was 2 times faster than 
my Haskell implememntation. But, to make sure, using some techniques that my 
code certainly didn't use.

An interesting thing happened, however, for very large numbers: The GMP power 
operation started using a lot of memory, thrashing, and then, eventually, 
with sufficiently large operands, crashed for lack of memory. Where my 
Haskell-implemented version continued to provide usable results.

The explanation of this should undoubtably be found in the method (some 
FFT-thing) used for the power operation by GMP for very large numbers: My 
guess is that GMP when using this method issues a large number of 
allocate/free requests. And, again I am guessing, assuming that each 
allocate request causes the GHC runtime to allocate, whereas each free only 
causes a "we'll handle this at the next GC", then such large computations 
may very well display such behaviour.

An example where the "Very Good Thing" of using the GHC RTS memory management 
could be questioned.

I will not bother you further, except for expressing my sincere respect for 
the work that you are doing.

Best regards
Thorkil

Re: Replacement for GMP: Update

Thorkil Naur