[I am not on the GHC list, so replies please either cc hugs-bugs or me.] I am chatting with the GMP (GNU Multi-Precision number library) people about "unboxed" (no pointer) representations. What do you experts on such implementations think would be a suitable low-level GMP implementation; that is, that would make the use in say GHC easier? (As I think GHC uses GMP, I thought this might the right place to ask for an input). Hans Aberg
[Wiadomość wysłana również na grupę dyskusyjną.]
Thu, 3 May 2001 20:48:11 +0200, Hans Aberg
I am chatting with the GMP (GNU Multi-Precision number library) people about "unboxed" (no pointer) representations.
Do you mean storing smaller values directly in the structure?
What do you experts on such implementations think would be a suitable low-level GMP implementation; that is, that would make the use in say GHC easier?
GHC does that optimization itself. Its Integer representation is thus: data Integer = S# Int# -- small integers | J# Int# ByteArray# -- large integers It keeps integers in the small variant where possible, switching to use the gmp variant on overflow. It's essential that these two variants are visible to ghc, so it can often generate appropriate dispatching code instead of physical allocation of these structures. The Int# in the J# variant corresponds to _mp_size in __mpz_struct, and the ByteArray# holds the pointer to Haskell heap block which contains: * a header pointer used by the GC, * _mp_alloc, * all the libs pointed to by _mp_d. Changing the representation in gmp would require massive rewriting in ghc. I don't know what improvement could be made by the way, so let me just describe what is currently going on. Code for primitive versions of (+)::Integer->Integer->Integer etc. has a comment: /* ToDo: this is shockingly inefficient */ It creates MP_INT variables, fills them from arguments of Haskell's J# objects (passed separately): arg1._mp_alloc = d1->words; arg1._mp_size = (s1); arg1._mp_d = (unsigned long int *) (BYTE_ARR_CTS(d1)); arg2._mp_alloc = d2->words; arg2._mp_size = (s2); arg2._mp_d = (unsigned long int *) (BYTE_ARR_CTS(d2)); calls the gmp function, and returns the _mp_size of the result together with its _mp_d-sizeof(two words) on the Haskell stack. ghc calls mp_set_memory_functions on startup to let gmp allocate ByteArray#s with appropriate header on the Haskell heap. ghc's Int# has always the same size as a pointer, either 32 or 64 bits. The runtime assumes that limbs have this size too, e.g. _mp_alloc here is just the number of words in the ByteArray#. There is no provision for turning the J# representation back into S# if it gets small enough. (If there was, it would not be enough to check abs(_mp_size)<=1, because mpz with abs(_mp_size)<=1 is able to represent one bit more than a single Int#, because the sign bit is stored in the _mp_size's sign.) I think it's essential to not make the S# case slower, but the representation of larger numbers could be changed if only somebody did the huge task of rewriting everything (with #ifdefs for older gmp). I'm not sure how to integrate gmp's view of optimizing small integers with ghc's view. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTĘPCZA QRCZAK
At 22:23 +0200 2001/05/03, Marcin 'Qrczak' Kowalczyk wrote:
I am chatting with the GMP (GNU Multi-Precision number library) people about "unboxed" (no pointer) representations.
Do you mean storing smaller values directly in the structure?
Yes, in one form or another. Let's focus on integers. Then GMP currently uses typedef struct { int _mp_alloc; int _mp_size; mp_limb_t *_mp_d; } __mpz_struct; where _mp_d is the pointer to the dynamically allocated number (mp_limb_t is an integral type of suitable size for the platform). My idea is to somehow augment it, so that when numbers are small, they do not use dynamical allocation. Say typedef struct { union { mp_int _mp_n; mp_limb_t* _mp_d; }; int unboxed; } __mpz_struct; or some variation of it. One problems though is the GC (Garbage Collector): My immediate concern is to write a C++ wrap, in which case the return of an integer is slow, as it will invoke the copy constructor, causing a dynamic allocation. One way around it is to use a reference count. Then, in order to avoid a dynamic allocation for the ref count as well, one wants the object pointed to by _mp_d above contain all data, say typedef struct { union { mp_int _mp_n; mp_limb_t* _mp_all; /* Pointer to all. */ }; int unboxed; } __mpz_struct; #define _mpz_alloc _mp_all[0] #define _mpz_count _mp_all[1] #define _mpz_size _mp_all[2] #define _mpz_d _mp_all + 3 But, after all, a ref count is just one primitive form of GC, used in C++ because it is easy to automate, and because it is difficult to implement a more advanced form of GC. So here comes the question in, if now GMP should serve say the implementation of compilers such as GHC, what kind of low number representations should one use? One advantage of having it in GMP is that it supports low-level assembler code.
GHC does that optimization itself. Its Integer representation is thus:
data Integer = S# Int# -- small integers | J# Int# ByteArray# -- large integers
It keeps integers in the small variant where possible, switching to use the gmp variant on overflow.
It's essential that these two variants are visible to ghc, so it can often generate appropriate dispatching code instead of physical allocation of these structures.
So this suggest that it might be a disadvantage for GHC to have a GMP mergeing the two number representations. Or is it so that the dispatch code can only be generated if the input data is sufficiently static, in which it would be great advantage with it in GMP in the case the data is dynamic?
The Int# in the J# variant corresponds to _mp_size in __mpz_struct, and the ByteArray# holds the pointer to Haskell heap block which contains: * a header pointer used by the GC, * _mp_alloc, * all the libs pointed to by _mp_d.
Changing the representation in gmp would require massive rewriting in ghc. I don't know what improvement could be made by the way, so let me just describe what is currently going on.
You shouldn't worry that anything going on in GMP would not respect upwards compatibility. It seems that this type of GMP discussions have taken place from time to time; I am only the only bringing it up now. The GMP developers have so far judged that the effort does not outweigh the advantage of having a type which is faster for small numbers. Do you think that the speed-up for smaller numbers would be significant for such a rewrite?
Code for primitive versions of (+)::Integer->Integer->Integer etc. has a comment: /* ToDo: this is shockingly inefficient */
Yes, I think so to. This is one reason for moving it into GMP, because on an assembler level, one can do more efficient overflow checks, etc.
It creates MP_INT variables, fills them from arguments of Haskell's J# objects (passed separately): arg1._mp_alloc = d1->words; arg1._mp_size = (s1); arg1._mp_d = (unsigned long int *) (BYTE_ARR_CTS(d1)); arg2._mp_alloc = d2->words; arg2._mp_size = (s2); arg2._mp_d = (unsigned long int *) (BYTE_ARR_CTS(d2)); calls the gmp function, and returns the _mp_size of the result together with its _mp_d-sizeof(two words) on the Haskell stack.
ghc calls mp_set_memory_functions on startup to let gmp allocate ByteArray#s with appropriate header on the Haskell heap.
ghc's Int# has always the same size as a pointer, either 32 or 64 bits. The runtime assumes that limbs have this size too, e.g. _mp_alloc here is just the number of words in the ByteArray#.
There is no provision for turning the J# representation back into S# if it gets small enough. (If there was, it would not be enough to check abs(_mp_size)<=1, because mpz with abs(_mp_size)<=1 is able to represent one bit more than a single Int#, because the sign bit is stored in the _mp_size's sign.)
I think it's essential to not make the S# case slower, but the representation of larger numbers could be changed if only somebody did the huge task of rewriting everything (with #ifdefs for older gmp). I'm not sure how to integrate gmp's view of optimizing small integers with ghc's view.
My guess is that the S# representation should be used for situations where small static size can be detected. Even overflow detection slows down these primitive a lot (or so I am told). Then when the static detection cannot be used, one would have to work with the J# representation. Hans Aberg
On Fri, May 04, 2001 at 11:32:36AM +0200, Hans Aberg wrote:
Let's focus on integers.
ghc doesn't use other gmp types.
One problems though is the GC (Garbage Collector):
Indeed. ghc solves it by using gmp format only temporarily to perform operations, and keeping numbers in its own structures.
Then, in order to avoid a dynamic allocation for the ref count as well, one wants the object pointed to by _mp_d above contain all data,
Attaching data to _mp_d can be done without changing gmp, but also without the ability to mix other library using gmp in the same program: by providing appropriate allocation functions, which allocate more, fill the header and return a shifted pointer to gmp. This is what ghc does. I think you would have to be careful to not use gmp functions which change numbers in place. In Haskell there is no problem because the interface is functional anyway, but a C++ user might want to have ++z performed in place. When the number can be shared, it doesn't work. So either do it functionally or copy by value. In either case sometimes you lose. Leaving gmp memory management to the programmer would make using it awful.
So here comes the question in, if now GMP should serve say the implementation of compilers such as GHC, what kind of low number representations should one use?
I don't know how to do it better than currently: that it lets the program/library which uses gmp manage gmp's memory.
So this suggest that it might be a disadvantage for GHC to have a GMP mergeing the two number representations.
It would be an unnecessary complication, but manageable. Since ghc creates mpz objects for each operation, it could use only the big representation of gmp on inputs, handling small numbers itself as currently. But it must be prepared to receive small result. It would be silly to wrap it in an allocated memory if gmp produced a small answer, so primops (implemented in C or assembler) should return either representation. It would be a bit ugly. Ghc's primops use a restriced set of types and rarely allocate memory themselves. For example plusInteger# has the type Int# -> ByteArr# -> Int# -> ByteArr# -> (# Int#, ByteArr# #) where (# x, y #) means to return both values on the stack. It is wrapped for integer addition thus: plusInteger :: Inteter -> Integer -> Integer plusInteger i1@(S# i) i2@(S# j) = case addIntC# i j of { (# r, c #) -> if c ==# 0# then S# r else toBig i1 + toBig i2 } plusInteger i1@(J# _ _) i2@(S# _) = i1 + toBig i2 plusInteger i1@(S# _) i2@(J# _ _) = toBig i1 + i2 plusInteger (J# s1 d1) (J# s2 d2) = case plusInteger# s1 d1 s2 d2 of (# s, d #) -> J# s d Returning one of two variants from a primop requires expressing it similarly to a C struct, without type punning. So it could be this: plusInteger# :: Int# -> ByteArr# -> Int# -> ByteArr# -> (# Int#, Int#, ByteArr# #) plusInteger (J# s1 d1) (J# s2 d2) = case plusInteger# s1 d1 s2 d2 of (# 0#, i, _ #) -> S# i (# _, s, d #) -> J# s d The third component in the S# case is ignored, but the primop must pass something - a pointer to a dummy Haskell object (this technique is used in some other primops). This assumes that the condition for having an alternative representation is the same in ghc and gmp: fitting into a single signed integer of the limb size (i.e. pointer size). Note that the J# representation currently relies on the fact that mpz can be reconstructed from an integer + an array of words of an explicit size. It's not possible to change the representation of mpz and keep the J# representation in ghc.
Or is it so that the dispatch code can only be generated if the input data is sufficiently static, in which it would be great advantage with it in GMP in the case the data is dynamic?
It's dynamic too. An advantage of making the variants visible to the compiler is that a function returning an Integer is compiled to a code which takes an address of two continuations as an argument, and enters the continuation corresponding to the constructor returned, passing it constructor arguments as arguments. It doesn't necessarily allocate S# or J# node on the heap if it's to be evaluated right away. This happens to all algebraic types in general (perhaps there are size constraints, I don't know). So I guess that dispatching is best done by using an algebraic type. But I'm not sure. The ghc documentation says that ghc loves single-constructor types. It was written a long time ago, but perhaps it's still true. So how to distinguish representations in another way? Well, perhaps data Integer = J# Int# Int# ByteArray# with a dummy object for the ByteArray# in the short case would work. I don't know how well: it's ugly but maybe fast, and it seems easier to integrate with a variant representation in gmp. In any case better judging of performance of ghc for various representations should be done by somebody with more knowledge than me. I don't know what are exact the overheads in various cases.
You shouldn't worry that anything going on in GMP would not respect upwards compatibility.
ghc abuses gmp by using its internals directly. When the representation of mpz changes, ghc must be changed, even if the C interface remains compatible. Even renaming some gmp functions and providing old names as macros (or something like this) in gmp3 caused compatibility problem for ghc, and ghc-4.08 requires gmp2, because assembler code calls C functions directly and cpp couldn't translate the names.
The GMP developers have so far judged that the effort does not outweigh the advantage of having a type which is faster for small numbers. Do you think that the speed-up for smaller numbers would be significant for such a rewrite?
It is significant. I don't know any numbers, but ghc's handling Integers was told to get a big speedup after introducing the separate representation for small integers. But this might be partially because calling gmp functions has a lot of indirections, construction of mpz objects etc., so the speedup could result from avoiding calling gmp at all. Allocation in ghc is fast. Perhaps in your C++ wrappers you can distinguish the variant above gmp, and use gmp at all only for large numbers? C++ doesn't have such indirections, you would surely keep original mpz objects, but it might be easier than changing gmp. Note that I'm not against changing gmp. It's not obvious how the consequences would be... -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTĘPCZA QRCZAK
At 19:06 +0200 2001/05/04, Marcin 'Qrczak' Kowalczyk wrote:
Let's focus on integers.
ghc doesn't use other gmp types.
GMP has a very interesting multi-precision floating number type as well, but it has the same problem as the integers: It does not use the native double's for small float, so it probably becomes slow.
One problems though is the GC (Garbage Collector):
Indeed. ghc solves it by using gmp format only temporarily to perform operations, and keeping numbers in its own structures.
Does that mean that you are getting extra memory allocations, or are you simply handling over a suitable mpz, and then picking up the _mp_d pointer in you own format? The latter sounds interesting, perhaps one can use it in a C++ wrap.
Then, in order to avoid a dynamic allocation for the ref count as well, one wants the object pointed to by _mp_d above contain all data,
Attaching data to _mp_d can be done without changing gmp, but also without the ability to mix other library using gmp in the same program: by providing appropriate allocation functions, which allocate more, fill the header and return a shifted pointer to gmp. This is what ghc does.
Yes, this is the approach I suggested, but then as a C library the one with the different allocation function will become binary incompatible with the regular library. Perhaps I though in too simplistic patterns; if I put the pointer shifting in the C++ library, I might automate it. But then the C++ library becomes incompatible with the C library. Perhaps it does not matter.
I think you would have to be careful to not use gmp functions which change numbers in place. In Haskell there is no problem because the interface is functional anyway, but a C++ user might want to have ++z performed in place. When the number can be shared, it doesn't work. So either do it functionally or copy by value. In either case sometimes you lose. Leaving gmp memory management to the programmer would make using it awful.
I do not think that standard C++ library containers give such guarantees (say for std::string); one merely builds an interface. (As for the ref count, I use a function detach() which removes any object to be mutated from the reference cluster before mutation.) So it should not be a problem. But it could be interesting for optimization to have GMP functions that are performed in place. One can use some interesting mixtures, say only put the information about the pointer, but copying the sign, in order to make sign changes fast.
So here comes the question in, if now GMP should serve say the implementation of compilers such as GHC, what kind of low number representations should one use?
I don't know how to do it better than currently: that it lets the program/library which uses gmp manage gmp's memory.
So this suggest that it might be a disadvantage for GHC to have a GMP mergeing the two number representations.
It would be an unnecessary complication, but manageable. Since ghc creates mpz objects for each operation, it could use only the big representation of gmp on inputs, handling small numbers itself as currently. But it must be prepared to receive small result. It would be silly to wrap it in an allocated memory if gmp produced a small answer, so primops (implemented in C or assembler) should return either representation.
Perhaps GMP should provide small number arithmetic, with a fast way to determine if the answer fits in a small number representation. For example mpz_si_si, taking two (long) int's, but with the result in a pointer. If one hands over at least two limbs long allocation, this function does not need to make an allocation, but the idea is that one should be able to look at the result to quickly determine if it fits into one int then.
It would be a bit ugly. Ghc's primops use a restriced set of types and rarely allocate memory themselves. For example plusInteger# has the type Int# -> ByteArr# -> Int# -> ByteArr# -> (# Int#, ByteArr# #) where (# x, y #) means to return both values on the stack. It is wrapped for integer addition thus:
plusInteger :: Integer -> Integer -> Integer plusInteger i1@(S# i) i2@(S# j) = case addIntC# i j of { (# r, c #) -> if c ==# 0# then S# r else toBig i1 + toBig i2 } plusInteger i1@(J# _ _) i2@(S# _) = i1 + toBig i2 plusInteger i1@(S# _) i2@(J# _ _) = toBig i1 + i2 plusInteger (J# s1 d1) (J# s2 d2) = case plusInteger# s1 d1 s2 d2 of (# s, d #) -> J# s d
Returning one of two variants from a primop requires expressing it similarly to a C struct, without type punning. So it could be this: plusInteger# :: Int# -> ByteArr# -> Int# -> ByteArr# -> (# Int#, Int#, ByteArr# #) plusInteger (J# s1 d1) (J# s2 d2) = case plusInteger# s1 d1 s2 d2 of (# 0#, i, _ #) -> S# i (# _, s, d #) -> J# s d
The third component in the S# case is ignored, but the primop must pass something - a pointer to a dummy Haskell object (this technique is used in some other primops).
All this checking probably slows down the small format as well.
This assumes that the condition for having an alternative representation is the same in ghc and gmp: fitting into a single signed integer of the limb size (i.e. pointer size).
So you are not using GMP as an interface. One should really try to get a better GMP interface so that program like GHC can use that instead.
Note that the J# representation currently relies on the fact that mpz can be reconstructed from an integer + an array of words of an explicit size. It's not possible to change the representation of mpz and keep the J# representation in ghc.
Or is it so that the dispatch code can only be generated if the input data is sufficiently static, in which it would be great advantage with it in GMP in the case the data is dynamic?
It's dynamic too. An advantage of making the variants visible to the compiler is that a function returning an Integer is compiled to a code which takes an address of two continuations as an argument, and enters the continuation corresponding to the constructor returned, passing it constructor arguments as arguments. It doesn't necessarily allocate S# or J# node on the heap if it's to be evaluated right away. This happens to all algebraic types in general (perhaps there are size constraints, I don't know). So I guess that dispatching is best done by using an algebraic type.
But I'm not sure. The ghc documentation says that ghc loves single-constructor types. It was written a long time ago, but perhaps it's still true. So how to distinguish representations in another way? Well, perhaps data Integer = J# Int# Int# ByteArray# with a dummy object for the ByteArray# in the short case would work. I don't know how well: it's ugly but maybe fast, and it seems easier to integrate with a variant representation in gmp.
In any case better judging of performance of ghc for various representations should be done by somebody with more knowledge than me. I don't know what are exact the overheads in various cases.
You shouldn't worry that anything going on in GMP would not respect upwards compatibility.
ghc abuses gmp by using its internals directly. When the representation of mpz changes, ghc must be changed, even if the C interface remains compatible.
Even renaming some gmp functions and providing old names as macros (or something like this) in gmp3 caused compatibility problem for ghc, and ghc-4.08 requires gmp2, because assembler code calls C functions directly and cpp couldn't translate the names.
So this, in the end, suggests that one perhaps should get a better GMP interface for perhaps both small and large number representations. But it should then be so that GHC could use that interface, rather than abusing its internals.
The GMP developers have so far judged that the effort does not outweigh the advantage of having a type which is faster for small numbers. Do you think that the speed-up for smaller numbers would be significant for such a rewrite?
It is significant. I don't know any numbers, but ghc's handling Integers was told to get a big speedup after introducing the separate representation for small integers.
But this might be partially because calling gmp functions has a lot of indirections, construction of mpz objects etc., so the speedup could result from avoiding calling gmp at all. Allocation in ghc is fast.
What I am was told a long time ago is that as soon as one leaves the simple builtin types, one must perform a lot of checks: CPU's are simply not built for handling multiprecision. Those checks steal a lot of cycles, in addition to memory allocation.
Perhaps in your C++ wrappers you can distinguish the variant above gmp, and use gmp at all only for large numbers? C++ doesn't have such indirections, you would surely keep original mpz objects, but it might be easier than changing gmp.
There is a C++ GMP wrap library already using small number representations and a ref count for large numbers, on top of the _mp_d allocation. But the overflow checks are slow on a C/C++ language level, and better moved into assembler where available.
Note that I'm not against changing gmp. It's not obvious how the consequences would be...
One must have a good idea of how to get about it, before starting to write on something like that. It is an interesting question though. Hans Aberg
On Fri, May 04, 2001 at 11:08:59PM +0200, Hans Aberg wrote:
GMP has a very interesting multi-precision floating number type as well, but it has the same problem as the integers: It does not use the native double's for small float, so it probably becomes slow.
I think that it's easier to check machine-size int for overflow than to check double for overflow or loss of precision, so it's impractical to use native double and keep predictable precision.
Indeed. ghc solves it by using gmp format only temporarily to perform operations, and keeping numbers in its own structures.
Does that mean that you are getting extra memory allocations,
No. When mpz is constructed to perform an operation, _mp_d is set to point to an already allocated memory on the ghc heap. Ghc's heap address can generally move during GC. gmp is called from blocks of C or assembler code which don't cause GC in the middle. Using persistent mpz objects would require allocation of _mp_d on the C heap which is slower, and a finalization hook for each Integer (which would be an additional overhead given the way custom finalization hooks are implemented).
Perhaps GMP should provide small number arithmetic, with a fast way to determine if the answer fits in a small number representation.
Indeed. Ghc has those primops, used for Integer arithmetic, with tricky implementations in the case the code is generated via C: /* ----------------------------------------------------------------------------- * Int operations with carry. * -------------------------------------------------------------------------- */ /* With some bit-twiddling, we can define int{Add,Sub}Czh portably in * C, and without needing any comparisons. This may not be the * fastest way to do it - if you have better code, please send it! --SDM * * Return : r = a + b, c = 0 if no overflow, 1 on overflow. * * We currently don't make use of the r value if c is != 0 (i.e. * overflow), we just convert to big integers and try again. This * could be improved by making r and c the correct values for * plugging into a new J#. */ #define addIntCzh(r,c,a,b) \ { r = a + b; \ c = ((StgWord)(~(a^b) & (a^r))) \ >> (BITS_IN (I_) - 1); \ } #define subIntCzh(r,c,a,b) \ { r = a - b; \ c = ((StgWord)((a^b) & (a^r))) \ >> (BITS_IN (I_) - 1); \ } /* Multiply with overflow checking. * * This is slightly more tricky - the usual sign rules for add/subtract * don't apply. * * On x86 hardware we use a hand-crafted assembly fragment to do the job. * * On other 32-bit machines we use gcc's 'long long' types, finding * overflow with some careful bit-twiddling. * * On 64-bit machines where gcc's 'long long' type is also 64-bits, * we use a crude approximation, testing whether either operand is * larger than 32-bits; if neither is, then we go ahead with the * multiplication. */ #if i386_TARGET_ARCH #define mulIntCzh(r,c,a,b) \ { \ __asm__("xorl %1,%1\n\t \ imull %2,%3\n\t \ jno 1f\n\t \ movl $1,%1\n\t \ 1:" \ : "=r" (r), "=&r" (c) : "r" (a), "0" (b)); \ } #elif SIZEOF_VOID_P == 4 #ifdef WORDS_BIGENDIAN #define C 0 #define R 1 #else #define C 1 #define R 0 #endif typedef union { StgInt64 l; StgInt32 i[2]; } long_long_u ; #define mulIntCzh(r,c,a,b) \ { \ long_long_u z; \ z.l = (StgInt64)a * (StgInt64)b; \ r = z.i[R]; \ c = z.i[C]; \ if (c == 0 || c == -1) { \ c = ((StgWord)((a^b) ^ r)) \ >> (BITS_IN (I_) - 1); \ } \ } /* Careful: the carry calculation above is extremely delicate. Make sure * you test it thoroughly after changing it. */ #else #define HALF_INT (1 << (BITS_IN (I_) / 2)) #define stg_abs(a) ((a) < 0 ? -(a) : (a)) #define mulIntCzh(r,c,a,b) \ { \ if (stg_abs(a) >= HALF_INT \ stg_abs(b) >= HALF_INT) { \ c = 1; \ } else { \ r = a * b; \ c = 0; \ } \ } #endif
So this, in the end, suggests that one perhaps should get a better GMP interface for perhaps both small and large number representations. But it should then be so that GHC could use that interface, rather than abusing its internals.
Yes, except that I'm not sure how much harder for ghc would be to use a GMP's interface than to do it itself. BTW, it could be nice to have a better way for writing large integer literals. Ghc used to convert them from a decimal string, and now it builds them from pieces by * and + in base 2^31-1 or 2^63-1. Both approaches are a bit ugly. But probably large integer literals are not that common for it to matter much. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTĘPCZA QRCZAK
At 18:45 +0200 2001/05/07, Marcin 'Qrczak' Kowalczyk wrote:
GMP has a very interesting multi-precision floating number type as well, but it has the same problem as the integers: It does not use the native double's for small float, so it probably becomes slow.
I think that it's easier to check machine-size int for overflow than to check double for overflow or loss of precision, so it's impractical to use native double and keep predictable precision.
It is easier, right. But for floats one can make in most cases an overflow estimate that will take care of most cases. For the rest of the cases, one can simply carry out the computations and check if it resulted in a NaN: If the CPU has a FPU, floats compute quickly, so this will probably be much faster than using multiprecision anyway.
Perhaps GMP should provide small number arithmetic, with a fast way to determine if the answer fits in a small number representation.
Indeed. Ghc has those primops, used for Integer arithmetic, with tricky implementations in the case the code is generated via C: ... [Thanks for code snippet.] ..
So this, in the end, suggests that one perhaps should get a better GMP interface for perhaps both small and large number representations. But it should then be so that GHC could use that interface, rather than abusing its internals.
Yes, except that I'm not sure how much harder for ghc would be to use a GMP's interface than to do it itself.
I arrived at a suggestion for GMP: typedef struct { _mp_int _mp_s; /* Small number representation. */ mp_limb_t *_mp_d; /* Pointer to the limbs and all other data. */ } __mpz_struct; where simply _mp_d is set to NULL if one is using the small number representation. (Strictly speaking, this might a new type with a different name in order to keep GMP upwards compatibility; let's skip that aspect here.) I find it interesting for several reasons: First, the small number format is wholly untampered with, so all one is down to is overflow checks. If a CPU would support instructions for +, -, *: 1-register x 1-register -> 2-register then the small number representations could be made very fast. (But I am told that many CPU's do not support that.) But the conversion back and forth from native integral types to GMP types are also easy: For example, the signed integral type one merely puts into the _mp_s field and sets _mp_d to NULL. The unsigned integral type one checks the most significant bit; if it is 0, one converts it as a signed, if it is 1, one sets it to the one limb format. GMP would only need to define the basic arithmetic operations involving __mpz_struct; the ones involving native integral types could be macroed on top of them. Can you tell me the pros and cons relative GHC of this suggestion?
BTW, it could be nice to have a better way for writing large integer literals. Ghc used to convert them from a decimal string, and now it builds them from pieces by * and + in base 2^31-1 or 2^63-1. Both approaches are a bit ugly. But probably large integer literals are not that common for it to matter much.
I am not sure what you mean here; in what context should these integer literals be entered? Hans Aberg
On Mon, May 07, 2001 at 07:45:59PM +0200, Hans Aberg wrote:
typedef struct { _mp_int _mp_s; /* Small number representation. */ mp_limb_t *_mp_d; /* Pointer to the limbs and all other data. */ } __mpz_struct; where simply _mp_d is set to NULL if one is using the small number representation.
Does _mp_s have a meaning when _mp_d is not NULL? Or do you keep the used size and allocated size under _mp_d? The latter is good that it's not necessary to copy these sizes into gmp structures separately but they live inside the ByteArray#. There are no null ByteArray#s. The GC currently aborts if a null pointer is found when a Haskell object is expected (C pointers are tagged like Int# etc., i.e. as nonpointers, and ByteArray#s are tagged like Haskell objects, even though they are not exactly normal Haskell objects). This could be probably changed (unless the benefit of having this ASSERT for debugging the GC is large; I doubt it). Depending on this I see two choices for Integer representation in ghc, with the second looking really well: 1. data Integer = S# Int# | J# ByteArray# -- Size no longer needed here. Primops have a hard case for returning the appropriate data. This is ugly: case primSomething# args of (# 0, s, _ #) -> S# s (# _, _, d #) -> J# d and I'm not sure that using a dummy ByteArray# and checking the pointer equality would work well. 2. Let the GC ignore null pointers. Introduce nullByteArray# primop. Mirror the gmp representation exactly and let gmp perform arithmetic on all numbers: data Integer = J# Int# ByteArray# It's still best if _mp_int has the size of a pointer.
I am not sure what you mean here; in what context should these integer literals be entered?
When the programmer writes for example: f :: Integer -> Integer f x = 54321453245328452134 - x the compiler must generate code which makes a mpz constant of the right value. Currently it generates something like this: lvl = S# 2147483647# lit = ((((S# 11# `timesInteger` lvl) `plusInteger` S# 1673077741#) `timesInteger` lvl) `plusInteger` S# 914624008#) f x = lit `minusInteger` x It used to generate something like this: lit = case addr2Integer# "54321453245328452134"# of (# s, d #) -> J# s d f = lit `minusInteger` x -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTĘPCZA QRCZAK
Could this discussion be moved to a different mailing list please? Hugs doesn't use the GMP library and if it ever did it would restrict its use to the most portable subset. It seems like this discussion is only relevant to GHC and, perhaps, HBC which are willing to pay almost any cost for a few cycles more. -- Alastair Reid
At 13:11 -0600 2001/05/07, Alastair Reid wrote:
Could this discussion be moved to a different mailing list please?
Your wish is granted.
Hugs doesn't use the GMP library and if it ever did it would restrict its use to the most portable subset. It seems like this discussion is only relevant to GHC and, perhaps, HBC which are willing to pay almost any cost for a few cycles more.
Actually, there is a generic GMP library that only uses C and not assembler (incidentally, the version that I happen to use). And the things we are discussing are perfectly valid from that viewpoint only. The assembler stuff is then only an optimization add-on. Hans Aberg
[Wiadomość wysłana również na grupę dyskusyjną.] (The reply went to hugs-bugs and glasgow-haskell-bugs, but in separate mails because I'm reading glasgow-haskell-bugs through my mail<->news gateway. Please Cc: replies to a ghc list or me; I'm not subscribed to hugs-bugs.) -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTĘPCZA QRCZAK
participants (3)
-
Alastair Reid -
Hans Aberg -
Marcin 'Qrczak' Kowalczyk