
Hello Peter, Sorry for the late reply. From your latest communication which seems to be Date: Sat, 12 Aug 2006 21:12:05 -0400 From: Peter Tanski
I hope I have answered a few of your points; some of them have given me more work but that is all right, I guess. ...
And you most certainly have, much more detailed than I could have hoped for. And hopefully, this additional work that you mention is not something that you considered a waste of time. I am also most happy to read that
Before I actually bind any library with GHC, I will thoroughly test it as a stand-alone library ...
since "correctness" of Integer computation is by far its most important property. (I would say that "performance" comes second, then what could be termed "interoperability" as third.) The support for Integer computation is a messy subject: On the one hand, even children out of the first few years of school are capable of understanding the basic functionality involved. On the other hand, providing efficient support of this functionality, especially if required in a wide selection of circumstances, requires a surprising amount of hard work and insight. I would say, if only really large integers were routinely required in actual, real-world computations, our CPUs would support these computations and there would be no need to consider the question in the, admittedly, fairly limited context of GHC. The situation seems to be that the functionality is perhaps considered obviously available, but the reality is that it can only be attained at significant cost (either real money or honest sweat). I am truly unable to tell what I would consider the ideal situation. On the one hand, I value greatly the freedom of choosing my circumstances without restraints. And I appreciate the desire of noble people like the GHC developers to be equally free of such restraints. This would point in the direction of basing Integer computations on a fairly generic implementation. Which insightful consideration would then force us to admit would probably cost a factor of, say, 2 or more in performance in some cases. On the other hand, the existence of libraries like GMP seems to offer an economic path out of this jungle. However, as the discussion over the past couple of weeks illustrates, the path may be unacceptably thorny. Let me quote some additional statements from this debate: On Thursday 10 August 2006 19:00, Simon Peyton-Jones wrote: ...
OK so you are talking of taking a copy of the BN code, changing the function names, and statically linking its allocator to GHC's RTS.
If someone wants to make a program that uses BN for some other purpose, they get the original BN, whose function names don't clash, and which uses malloc.
Sounds ok to me.
Simon ...
Two problems seem to be lurking here: The first is that, although taking a copy of some existing library and massaging it to become working under some presently available circumstances may be fine, what is really needed is something that keeps on working, even under changing and future circumstances. The second is that if I wish to mix Haskell and non-Haskell code that processes the same data, I may find myself in need of a way to convert between whatever is required by this copy of some existing library that has been created and the presently available, but potentially quite different, version of that same library. No offence is meant here: I know that I am simply re-iterating what I have said earlier. However, I believe that this is sufficiently important to risk being ridiculed for pointing out the obvious. Then you ask
... If you have the time, would you be able to find any programs you might have that displayed poor Integer performance or possibly bottlenecks in the current system?
Let me say first that I have not experienced any really surprisingly poor performance of the current system. To be sure, I have seen programs similar to my own and not written in Haskell that performed better on comparable tasks. But I have not been able to say specifically that this poorer performance of my program was caused by some inadequacy in the Haskell (with GMP) implementtion. I better be specific here: What I do is integer factorization (splitting integers into their prime factors). And, for example, I have implemented a version of the ECM (Elliptic Curve Method) in Haskell. Having done that (independently and mainly to learn what it was about), I eventually compared it in performance to GMP-ECM which Paul Zimmermann wrote. And I was initially disappointed to find that GMP-ECM was about 7 times faster than my ECM code. Fortunately, various sessions of "hand-waving" brought this factor down to about 1.5. So, whenever GMP-ECM used 2 minutes, my program would use 3 minutes. I must admit that I am neither satisfied nor done with this result. At the outset, when I started using Haskell for these number theoretic computations, I had hoped that the overhead of using Haskell, compared to the use of C with GMP calls, would be insignificant, something like 10 or at worst 20 percent. So this 50 percent of overhead seems a bit excessive. On the other hand, there may be additional hands to wave that I haven't thought of. And again: I am not done working with this, just holding an extended break. Please indicate if you wish to see some detailed and hard results. Some additional experience is this: To gain some potentially easier-to-compare measurements between Haskell-with-GMP and C-with-GMP, I implemented the GMP function mpz_powm (raising a number to a power modulo a third number) as a primitive operation in GHC. And compared it to my own power-raising operation that was implemented in Haskell. Overall, the GMP implementation of the power operation was 2 times faster than my Haskell implememntation. But, to make sure, using some techniques that my code certainly didn't use. An interesting thing happened, however, for very large numbers: The GMP power operation started using a lot of memory, thrashing, and then, eventually, with sufficiently large operands, crashed for lack of memory. Where my Haskell-implemented version continued to provide usable results. The explanation of this should undoubtably be found in the method (some FFT-thing) used for the power operation by GMP for very large numbers: My guess is that GMP when using this method issues a large number of allocate/free requests. And, again I am guessing, assuming that each allocate request causes the GHC runtime to allocate, whereas each free only causes a "we'll handle this at the next GC", then such large computations may very well display such behaviour. An example where the "Very Good Thing" of using the GHC RTS memory management could be questioned. I will not bother you further, except for expressing my sincere respect for the work that you are doing. Best regards Thorkil