
Hi Peter, Peter Tanski wrote:
In the July thread, (Repost) Replacement for GMP as Bignum: ARPREC? Haskell?; OS X and OpenSSL, you wrote:
In past, I tried to get rid of GMP by replacing it with libtommath http://math.libtomcrypt.com/ But I have given up for now - because of related and unrelated problems.
Since I had no prior experience with LibTomMath I decided to take a look at it. The most recent release version of LibTomMath seems to be 0.39. Were some of the "related problems" you ran into due to bugs in an earlier version of LibTomMath? Maybe it is premature for me to go looking at replacing Integer when GHC has already moved so far ahead of itself I have to service the source code just to get it to build on OS X, but I figure that while I have the hood up I might as well take a look at the rest of the engine... If you have any old notes on the problems you encountered, I would greatly appreciate it.
What I have written here might not be the most useful guide to start with, but maybe it is of help for other interested souls. My first and foremost problem was that I had certain amount time I was going to use on it and that I use Windows, where ghc build breaks pretty often - and it was very badly broken at the time. I simply ran out of time and motivation. The resistance I met in community (irc mainly) for this step was also annoying. I also went wrong way when I started and added configure-option, but Simon Marlow (iirc) asked me not to add configure option, but to obliterate gmp alltogether. The big steps, to get things done, in this matter are: * Understanding how memory handling between math lib and GC works * Understanding some C-- * Understanding autotools and makefile hackery If you are new to ghc source, you should read (some of) http://hackage.haskell.org/trac/ghc/ and http://darcs.haskell.org/ghc/HACKING Most of the code for this task is fairly easy, actually. The bad thing is that it cannot be easily split into many small steps. Most of the task has very small scope: Most of the backend doesn't care, we just use exising bits, frontend and base-package don't see much or any diffrence. I'll add here some notes that might be helpful (but as I type these out of memory, they might not be 100% accurate): * The memory handling: The idea on most bignum libs is that they have c-structure kinda like this: struct Big { word size, used; digits* payload; bool sign; } ; Now, the size and used tell how much memory is allocated for paylod and how much of it is used. sign is the sign (plus/minus). payload is a pointer to memory that contains Integer decoded. Normally, in c, you would reuse these variables, but with immutability, we can't do that. This is handled in GHC so that payload (and sign) are stored in a bytearray, which is native garbage collectable type. It natively contains size. Before we call math-lib, we put together a temporary structure with correct pointers. As for target variable, we have hooked the mathlibs memory allocation functions to allocate correctly. Upon returning Integer, we just take payload, write sign on correct place and return the payload-pointer (possibly adjusted). In pseudo C digits* add(digits* din) { Big in, out; in.size=getLength(din); in.used=getLength(din); in.payload=din; in.sign=getSign(din); math_lib_init(out); math_lib_add(out, in); writeSign(out.payload, out.sign); return out.payload; } There are tricky parts for 64bit-stuff in 32bit systems and some floating point decoding uses bad configure-stuff that depends on math lib stuff, but mostly it's very boring hundreds of lines of C-- (but you can make the job much easier by using preprocessor). This is why APPREC might be hard - you need to know the internal representation. * C-- and related RTS stuff C-- is pretty easy, you should read C-- spec at cminusminus.org. GHC C-- unfortunately is not really near the C-- spec, it doesn't first of all implement it all - but that doesn't matter for this task - and then it has some extensions for casting and structure reading, I think. When removing GMP, it is easy to spot places that needs corresponding declrations/definitions for new math-lib instead of GMP. Reading old GMP primops, where most of the interesting c-- code is at, should also make it clearer - after a while, atleast. * Autotools and makefiles What a mess. Well, it always is for big projects. The annoying thing here for LibTomMath was that I found out, I'd need to add autotool configure-hackery into ghc and make internal libtommath dependant on it, so that digit-size and such were the size of native "word" on each machine. There's also a lot of DLL-related stuff in makefiles, which, I think, can safely be removed - some of it is wrong and outdated, so when/if it gets waken up, it needs to be rewritten anyway. GMP has it's own configure-script and it's invoked from the makefiles. To take on this task, I wouldn't say huge knowledge on ghc internals is required, but some knowledge on memory handling and low-level (near asm) programming is useful. Haskell knowledge for most part is not required. Fast machine is useful, but tweaking build options and having a good book/movies/other machine gets it done. It'd be very very useful to have 64-bit platform (in practice 64bit linux ia32_64) for testing. I'd say OS X is not a good platform to do this devel on, but I might be wrong on that. Linux being best tested OS, it is the most safe bet. I'd need to get rid of GMP, and if I can help (short of putting weeks worth of hours into) just shout. HTH, --Esa