
LICENSE: BSD3 (or also similar, like GHC, etc.)
INITIAL AUTHOR: Isaac Dupree

Hello Isaac, Friday, August 10, 2007, 7:56:25 PM, you wrote:
The internal format is (currently) a list([]) of Ints
my first thought is that using list of large chunks as in lazy bytestrings may improve efficiency in future -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

BTW: I estimate that it took me about two solid weeks of time, more than I had originally intended to spend :) I think using CPP will prove important, but makes it harder to test in Hugs... should I use Cabal if I want to use Hugs with haskell+cpp code? Bulat Ziganshin wrote:
Hello Isaac,
Friday, August 10, 2007, 7:56:25 PM, you wrote:
The internal format is (currently) a list([]) of Ints
my first thought is that using list of large chunks as in lazy bytestrings may improve efficiency in future
My first aim is portability, as Yhc demonstrated it could be helped by a portable Integer that is usually within the size of an Int. I hated the idea that Int was being hackily used in place of Integer :) For very large numbers, that (i.e. unboxed-array equivalents) would be more memory- as well as possibly speed efficient. Almost all the operations in my implementation are defined inductively, not knowing the exact size of the result in advance, so that would be somewhat different. One *interesting* aspect of the format I used is that unbalanced-size addition (and .&., .|.) are on average O(min(m,n)), because the tail of the list can stay the same except for carrying. e.g. 3 + 2789234879453889743578934589893459834589345898945378974358798973487... is about as fast as (3 + 4). (Note that this asymptotic behavior will only actually be true in my implementation when my version of assertions are turned off, which verify that each generated integer is valid list format. Also it does not work for the second argument of subtraction (nor negate, abs, xor) because negation really does have to change the whole list, given its current format (hmm... abs could short-circuit if it sees its argument is positive, I suppose).) Johm Meacham wrote:
Excellent! this has always been a project I wanted to do in the back of my mind. I will likely make it the default Integer implementation for jhc. I look forward to the fun compiler optimizations it will inspire to make it speed comparable to gmp
I can imaging speed near GMP for small-to-moderate size numbers, but GMP implements LOTS of algorithms as well as a number of sophisticated low-level foundations to make itself very fast. One "advantage" we have is if we're only trying to implement the basic Integer algorithms... Anyway I would be quite happy even with a Haskell implementation that only works very fast for up to, say a few-hundred-digit(base 10) numbers, which I can imagine we might be able to manage. (GMP 4.2.x in my experience starts getting very slow for 1000-10000 digit(base 10) numbers and up.) GMP development goes on and, well, *I'm* not that interested in duplicating it - I prefer the idea of one powerful bignum implementation to be perfected. (GMP's implementation code is not very readable to an outsider - I've tried a little - I'd prefer to win on understandability in Haskell, but as you say, compiler optimizations have the *potential* to really help the speed of a readable implementation.) Isaac

Isaac Dupree wrote:
John Meacham wrote:
Excellent! this has always been a project I wanted to do in the back of my mind. I will likely make it the default Integer implementation for jhc. I look forward to the fun compiler optimizations it will inspire to make it speed comparable to gmp
Actually we have a few advantages over GMP: We can do inlining, and analysis for static knowledge (e.g. small numbers or nonzero numbers), much more than in C, which knows nothing about its bignum type. Probably this would work best with some knowledge baked into the compiler about the meaning of Integer operations (like the way GHC has metadata along with the definitions of its primitives, at least whether they're symmetric operations...) (Unless some way to specify that with compiler pragmas is invented, but computing with Integers may be a bit of a special case.) I want to see compilers do more sophisticated constant folding (generally, computation at compile time), which has to watch out for non-pure, non-safe and non-total (nonterminating) functions as well as deciding whether some partial computation will reduce code size and improve speed, or just bloat it. We can make a higher-level library/type, that relies on any Integer implementation, and gives a type that does not evaluate immediately, so optimizations (runtime) can be done on series of operations. Er... the Integer type would probably have to provide operations that were more efficient and did weird things -- either a Haskell Integer implementation or GMP's mpn layer would probably be useful. (Haskell is a particularly good language for things like this, I think... if it is likely to be useful... It is useful for matrix operations in C++ :)) Another thing I was wondering is whether there's any point making it be able to work on the GMP-integer representation that GHC stores internally... probably not much. Isaac

On Sat, Aug 11, 2007 at 12:09:44PM -0300, Isaac Dupree wrote:
BTW: I estimate that it took me about two solid weeks of time, more than I had originally intended to spend :)
I think using CPP will prove important, but makes it harder to test in Hugs... should I use Cabal if I want to use Hugs with haskell+cpp code?
I think if you were willing to use what is provided by the FFI extension, you could make some signifigant improvements without resorting to CPP. mainly, you could use unsigned arithmetic and have access to bit operations. I was thinking a representation like data Integer = Integer !Bool Rest data Rest = Digit !Word Rest | End where the Bool indicates the sign, and the rest are the base-wordsize digits. It would also be possible to just use a sign bit in the number of course. that unboxed strict Word should make a big difference. (it is not clear to me whether being strict in the 'Rest' field will help or hinder, testing with both would be interesting) John -- John Meacham - ⑆repetae.net⑆john⑈

On Mon, Aug 13, 2007 at 03:21:54PM -0700, John Meacham wrote:
On Sat, Aug 11, 2007 at 12:09:44PM -0300, Isaac Dupree wrote:
BTW: I estimate that it took me about two solid weeks of time, more than I had originally intended to spend :)
I think using CPP will prove important, but makes it harder to test in Hugs... should I use Cabal if I want to use Hugs with haskell+cpp code?
I think if you were willing to use what is provided by the FFI extension, you could make some signifigant improvements without resorting to CPP. mainly, you could use unsigned arithmetic and have access to bit operations.
I was thinking a representation like
data Integer = Integer !Bool Rest data Rest = Digit !Word Rest | End
where the Bool indicates the sign, and the rest are the base-wordsize digits. It would also be possible to just use a sign bit in the number of course. that unboxed strict Word should make a big difference.
You might also try data Integer = Pos !Word | Neg !Word | Big !Word Integer since the incremental penalty of 3 constructors is quite small, and it eliminates a few indirections. (Note: You have to actually specify -funbox-strict-fields for the ! to do much good.) Stefan

Stefan O'Rear wrote:
On Mon, Aug 13, 2007 at 03:21:54PM -0700, John Meacham wrote:
On Sat, Aug 11, 2007 at 12:09:44PM -0300, Isaac Dupree wrote:
BTW: I estimate that it took me about two solid weeks of time, more than I had originally intended to spend :)
I think using CPP will prove important, but makes it harder to test in Hugs... should I use Cabal if I want to use Hugs with haskell+cpp code? I think if you were willing to use what is provided by the FFI extension, you could make some signifigant improvements without resorting to CPP. mainly, you could use unsigned arithmetic and have access to bit operations.
I was thinking a representation like
data Integer = Integer !Bool Rest data Rest = Digit !Word Rest | End
where the Bool indicates the sign, and the rest are the base-wordsize digits. It would also be possible to just use a sign bit in the number of course. that unboxed strict Word should make a big difference.
You might also try
data Integer = Pos !Word | Neg !Word | Big !Word Integer
since the incremental penalty of 3 constructors is quite small, and it eliminates a few indirections.
(Note: You have to actually specify -funbox-strict-fields for the ! to do much good.)
I use the {-# UNPACK #-} pragma where I want that effect. Simpler representation = simpler for a compiler to output. (although I could easily provide a Haskell module containing (Integer -> (that thing's shape) ) for compilers to include.) No-FFI is particularly important (this is NOT GMP), but I don't think I need FFI - the compiler just has to support it somehow - to use Data.Word. I do suspect that storing in two's complement style / with separate sign, might be more efficient for some operations. (I'm particularly concerned that small numbers should be fast, so that Int doesn't have to be used all over the place just for speed reasons. Since my format requires two "case"s to determine that an Integer is a small number, _if_ that is the most costly operation, it might be improved) And of course any of you can try the variant implementations you propose, if you want to put in the work on it! I'm not interested for a long time. The more interesting-looking part to me now is seeing what a good mechanism is for using these implementations in compilers is (which is not entirely obvious to me). I'm BTW. base-wordsize digits are not very good for addition speed, depending... Even GMP uses a somewhat smaller base than that, I believe. Look: * The compiler only has to output lists (it already must do string literals), and Ints, in compiled programs. * No types outside Haskell98 need to be implemented. * The Int does not need to be a certain large size, does not need to be two's-complement bits (except for the extension class Bits which seems to assume two's complement implicitly for signed numbers), and does not need to have any particular overflow behavior. I'm not the one with the energy to make compromises between performance and portability. Seeing something like this in Jhc should be interesting and there is where decent-good performance will be wanted. I know my implementation is quite inferior performance to GMP, in _GHC_ :) I still think GMP is the place to go for the highest, most well-rounded performance. (Although "We will release 4.2.2 in spring 2007." did not come to pass, and still appears on gmplib.org) I wonder if there's any way (Jhc) compiler optimizations could transform a strict inductive data into a UArray sort of representation, since that representation is probably more efficient for Integers... Isaac

Excellent! this has always been a project I wanted to do in the back of my mind. I will likely make it the default Integer implementation for jhc. I look forward to the fun compiler optimizations it will inspire to make it speed comparable to gmp :) John -- John Meacham - ⑆repetae.net⑆john⑈

a statistic I thought of: estimate of lines of non-comment code (after reformatting by ghc) : $ ghc IntegerInTermsOfInt.hs -ddump-parsed | wc -l 1084 (This is the only way I've come up with so far, to easily exclude the many and various comments, in a line-count.) ISaac

On Sun, Aug 12, 2007 at 08:39:23AM -0300, Isaac Dupree wrote:
a statistic I thought of: estimate of lines of non-comment code (after reformatting by ghc) : $ ghc IntegerInTermsOfInt.hs -ddump-parsed | wc -l 1084
$ sloccount IntegerInTermsOfInt.hs [...] SLOC Directory SLOC-by-Language (Sorted) 1010 top_dir haskell=1010 [...] Thanks Ian
participants (5)
-
Bulat Ziganshin
-
Ian Lynagh
-
Isaac Dupree
-
John Meacham
-
Stefan O'Rear