renamed GMP symbols in GHC

Dear all,
Several issues related to the way GMP is included in GHC were publicly discussed in the past with the goal of replacing GMP. As summarised in this wiki by Peter Tanski, the main issues were:
(1) Licensing
(2) Memory Structure; Simultaneous Access to GMP by Foreign (C) code in the Same Binary.
My immediate concern is (2) as I develop programs that link with MPFR, which uses GMP. So far I got around the issue by compiling GHC with integer-simple. This is fine for me but I feel the need to recompile ghc may be hindering some people who would like to try my software that links with MPFR. I have therefore been looking for a way to overcome (2) that could make it to the default build of GHC.
As suggested by Simon PJ in this email, a simple way to deal with issue (2) is to "*copy* GMP, changing all the function names". I recently found some time and had a go at implementing this suggestion and it turned out to be quite easy. I described my recipe to do such a "copy" here but currently I have been able to test it only on x86 Linux and GHC 7.2.1.
I checked that the renaming makes a difference by running all QuickCheck tests of my AERN-Real-MPFR package. Some of these tests persistently fail when compiled with a standard GHC. All tests succeed with GHC/integer-simple and also with GHC 7.2.1 using the renamed GMP.
I was wondering whether such a change (more polished and tested on all common platforms) could make it to the future official GHC releases. If the GHC developers would support this change, I would be happy to put more work into it.
I think the following concrete changes would be required in the GHC distribution:
(a) make a ghc build always use the bundled GMP
(b) apply a renaming script onto the GMP tar before adding it to the GHC source bundle
(c) rename symbols analogously in integer-gmp/cbits/*
An alternative to (b) is to apply the renaming to the GMP sources just before building it.
One thing I am not clear about is the impact such a change would have on lincensing. My understanding is that GHC sources include GMP sources and in the absence of an installed GMP library, they are statically linked into the integer-gmp package. (Does it mean that the integer-gmp package should be LGPL lincensed?) The suggested changes would mean that this kind of linking would happen always, not only when no other GMP library is available on the system. In either case, it seems to me that using integer-gmp as a shared library would still be OK for producing non-LGPL code.
I would be grateful for your views and guidance.
Best regards,
Michal
--
|o| Michal Konecny

On 22/12/2011 22:58, Michal Konečný wrote:
Several issues related to the way GMP is included in GHC were publicly discussed in the past with the goal of replacing GMP. As summarised in this wiki by Peter Tanski http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes, the main issues were:
(1) Licensing
(2) Memory Structure; Simultaneous Access to GMP by Foreign (C) code in the Same Binary.
My immediate concern is (2) as I develop programs that link with MPFR, which uses GMP. So far I got around the issue by compiling GHC with integer-simple. This is fine for me but I feel the need to recompile ghc may be hindering some people who would like to try my software that links with MPFR. I have therefore been looking for a way to overcome (2) that could make it to the default build of GHC.
As suggested by Simon PJ in this email http://www.haskell.org/pipermail/glasgow-haskell-users/2006-August/010676.ht..., a simple way to deal with issue (2) is to "*copy* GMP, changing all the function names". I recently found some time and had a go at implementing this suggestion and it turned out to be quite easy. I described my recipe to do such a "copy" here http://code.google.com/p/hmpfr/wiki/GHCWithRenamedGMP but currently I have been able to test it only on x86 Linux and GHC 7.2.1.
I checked that the renaming makes a difference by running all QuickCheck tests of my AERN-Real-MPFR package. Some of these tests persistently fail when compiled with a standard GHC. All tests succeed with GHC/integer-simple and also with GHC 7.2.1 using the renamed GMP.
I was wondering whether such a change (more polished and tested on all common platforms) could make it to the future official GHC releases. If the GHC developers would support this change, I would be happy to put more work into it.
I think the following concrete changes would be required in the GHC distribution:
(a) make a ghc build always use the bundled GMP
(b) apply a renaming script onto the GMP tar before adding it to the GHC source bundle
(c) rename symbols analogously in integer-gmp/cbits/*
An alternative to (b) is to apply the renaming to the GMP sources just before building it.
One thing I am not clear about is the impact such a change would have on lincensing. My understanding is that GHC sources include GMP sources and in the absence of an installed GMP library, they are statically linked into the integer-gmp package. (Does it mean that the integer-gmp package should be LGPL lincensed?) The suggested changes would mean that this kind of linking would happen always, not only when no other GMP library is available on the system. In either case, it seems to me that using integer-gmp as a shared library would still be OK for producing non-LGPL code.
I would be grateful for your views and guidance.
Ok, as I understand it this would be fine from a licensing perspective: we would be modifying the source, but distributing the modifications (either as a patch or a script, it doesn't matter). One potential problem is that some Linux distributions really don't like it if you bundle modified versions of external libraries. However, I just don't see a way around this: GMP is inherently broken because it has global state, so if you want two use it from two clients in the same program, you need two copies of it. If these Linux distributions still kick up a fuss, then we would have to back off and not bundle the modified GMP, but then users of GHC on those distros would have to install their own local copy of GHC in order to use MPFR or some other GMP client. Can anyone involved with packaging for Debian or Fedora comment? Cheers, Simon

[Fullquote for the benefit of those on the Debian Haskell list.] Dear Simon, Am Mittwoch, den 04.01.2012, 12:21 +0000 schrieb Simon Marlow:
On 22/12/2011 22:58, Michal Konečný wrote:
Several issues related to the way GMP is included in GHC were publicly discussed in the past with the goal of replacing GMP. As summarised in this wiki by Peter Tanski http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes, the main issues were:
(1) Licensing
(2) Memory Structure; Simultaneous Access to GMP by Foreign (C) code in the Same Binary.
My immediate concern is (2) as I develop programs that link with MPFR, which uses GMP. So far I got around the issue by compiling GHC with integer-simple. This is fine for me but I feel the need to recompile ghc may be hindering some people who would like to try my software that links with MPFR. I have therefore been looking for a way to overcome (2) that could make it to the default build of GHC.
As suggested by Simon PJ in this email http://www.haskell.org/pipermail/glasgow-haskell-users/2006-August/010676.ht..., a simple way to deal with issue (2) is to "*copy* GMP, changing all the function names". I recently found some time and had a go at implementing this suggestion and it turned out to be quite easy. I described my recipe to do such a "copy" here http://code.google.com/p/hmpfr/wiki/GHCWithRenamedGMP but currently I have been able to test it only on x86 Linux and GHC 7.2.1.
I checked that the renaming makes a difference by running all QuickCheck tests of my AERN-Real-MPFR package. Some of these tests persistently fail when compiled with a standard GHC. All tests succeed with GHC/integer-simple and also with GHC 7.2.1 using the renamed GMP.
I was wondering whether such a change (more polished and tested on all common platforms) could make it to the future official GHC releases. If the GHC developers would support this change, I would be happy to put more work into it.
I think the following concrete changes would be required in the GHC distribution:
(a) make a ghc build always use the bundled GMP
(b) apply a renaming script onto the GMP tar before adding it to the GHC source bundle
(c) rename symbols analogously in integer-gmp/cbits/*
An alternative to (b) is to apply the renaming to the GMP sources just before building it.
One thing I am not clear about is the impact such a change would have on lincensing. My understanding is that GHC sources include GMP sources and in the absence of an installed GMP library, they are statically linked into the integer-gmp package. (Does it mean that the integer-gmp package should be LGPL lincensed?) The suggested changes would mean that this kind of linking would happen always, not only when no other GMP library is available on the system. In either case, it seems to me that using integer-gmp as a shared library would still be OK for producing non-LGPL code.
I would be grateful for your views and guidance.
Ok, as I understand it this would be fine from a licensing perspective: we would be modifying the source, but distributing the modifications (either as a patch or a script, it doesn't matter).
One potential problem is that some Linux distributions really don't like it if you bundle modified versions of external libraries. However, I just don't see a way around this: GMP is inherently broken because it has global state, so if you want two use it from two clients in the same program, you need two copies of it. If these Linux distributions still kick up a fuss, then we would have to back off and not bundle the modified GMP, but then users of GHC on those distros would have to install their own local copy of GHC in order to use MPFR or some other GMP client.
Can anyone involved with packaging for Debian or Fedora comment?
I guess this means me... Indeed Debian has the policy to avoid modified bundled libraries, if somehow possible. For example, we patch the build system to use the system-provided libffi. If you say that GMP is broken, maybe there is a way to fix it proper? E.g. patch GMP (upstream) to provide an API where the state is explicit passed in the function argument, so that GHC can keep its GMP state separate from other clients, which could continue to use the default global state? Is there at least an upstream bug report about this issue – maybe upstream can help us out here? Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

Dear Joachim and Simon, Thank you for your responses. On Wednesday 04 January 2012 12:31:23 Joachim Breitner wrote:
I guess this means me... Indeed Debian has the policy to avoid modified bundled libraries, if somehow possible. For example, we patch the build system to use the system-provided libffi.
I am curious about the precise definition of "bundled libraries". It can be arranged that the GMP source is modified at GHC build time, so the _source_ package contains the original unmodified tar of GMP (except without documentation). Nevertheless, the _binary_ GHC package will contain integer-gmp library files that contain a binary copy of GMP whose symbols have been renamed. Does this count as a "modified bundled library"? (I am guessing yes.) If such binary bundling is not permissible, would it ok to have a separate Debian package called eg libghcgmp3c2 which is equal to libgmp3c2 except the exported symbols are renamed as expected by a new integer-gmp and the files are suitably renamed to avoid any conflict with libgmp3c2? On Wednesday 04 January 2012 12:21:13 Simon Marlow wrote:
GMP is inherently broken because it has global state, so if you want two use it from two clients in the same program, you need two copies of it.
If this could be fixed that would be fantastic. Nevertheless, I am currently unaware of how hard this might be to persue, technically or politically. (My gut feeling is that it is not straightforward.) Kind regards, Michal On 22/12/2011 22:58, Michal Konečný wrote:
I think the following concrete changes would be required in the GHC distribution:
(a) make a ghc build always use the bundled GMP
(b) apply a renaming script onto the GMP tar before adding it to the GHC source bundle
(c) rename symbols analogously in integer-gmp/cbits/*
An alternative to (b) is to apply the renaming to the GMP sources just before building it.
--
|o| Michal Konecny

2012/1/4 Michal Konečný
On Wednesday 04 January 2012 12:21:13 Simon Marlow wrote:
GMP is inherently broken because it has global state, so if you want two use it from two clients in the same program, you need two copies of it.
If this could be fixed that would be fantastic. Nevertheless, I am currently unaware of how hard this might be to persue, technically or politically. (My gut feeling is that it is not straightforward.)
My understanding is that they rejected the idea of moving away from global state for performance reasons, and are not inclined to reconsider. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Dear Michal, Am Mittwoch, den 04.01.2012, 16:33 +0000 schrieb Michal Konečný:
On Wednesday 04 January 2012 12:31:23 Joachim Breitner wrote:
I guess this means me... Indeed Debian has the policy to avoid modified bundled libraries, if somehow possible. For example, we patch the build system to use the system-provided libffi.
I am curious about the precise definition of "bundled libraries". It can be arranged that the GMP source is modified at GHC build time, so the _source_ package contains the original unmodified tar of GMP (except without documentation). Nevertheless, the _binary_ GHC package will contain integer-gmp library files that contain a binary copy of GMP whose symbols have been renamed. Does this count as a "modified bundled library"? (I am guessing yes.)
If such binary bundling is not permissible, would it ok to have a separate Debian package called eg libghcgmp3c2 which is equal to libgmp3c2 except the exported symbols are renamed as expected by a new integer-gmp and the files are suitably renamed to avoid any conflict with libgmp3c2?
both would be no better than having a modified copy in the ghc tarball. This is not a formal requirement but rather a guideline with a rationale that code should be shared, not copied. The most prominent reason is security fixes: If code is copied and a security hole is found, the security team needs to hunt down all copies. With a single shared library, this is not a problem (zlib has been repeatedly a “good” example of this problem). Now you might argue that gmp will never be the source of security problems (although I woudn’t be too convinced about that). But even then regular bug fixes and arch-specific fixes (which were required once for s390) in the main gmp library would not reach GHC automatically. The guideline is in place in Debian also because we think it is the right thing to do, even if sometime more work, for a better and healthier ecosystem. So in conclusion: If you just cannot use the regular GMP library, then just copy it and live with the bad effects. You do not have to put effort in to make it look “nicer” (such as putting it in a separate library package). But preferably, try hard to avoid this issue, also for your own benefit. BTW, Is there a way to get the linker to create two independent copies of a library in one program space? Maybe if it is compiled as PIC (random name dropping here)? That would seem to be an elegant solution, as it makes the distro packers happy and you would not have to maintain a code copy.
On Wednesday 04 January 2012 12:21:13 Simon Marlow wrote:
GMP is inherently broken because it has global state, so if you want two use it from two clients in the same program, you need two copies of it.
If this could be fixed that would be fantastic. Nevertheless, I am currently unaware of how hard this might be to persue, technically or politically. (My gut feeling is that it is not straightforward.)
Someone (I know, not a helpful way to start a sentence :-)) should ask upstream before we make guesses. Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

On Wed, Jan 4, 2012 at 11:50, Joachim Breitner
Now you might argue that gmp will never be the source of security problems (although I woudn’t be too convinced about that). But even then
There's actually a patch for a (claimed to be minor potential) security issue referenced on the releases page at the moment. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Hi all, On 04.01.2012, at 17:50, Joachim Breitner wrote:
BTW, Is there a way to get the linker to create two independent copies of a library in one program space? Maybe if it is compiled as PIC (random name dropping here)? That would seem to be an elegant solution, as it makes the distro packers happy and you would not have to maintain a code copy.
In the past, I've linked a C++ library that used gmpxx against a Haskell program by renaming all symbols starting with gmp to mygmp using objcopy. Unfortunately, this is not portable and completely broke down on Mac OS when Apple moved to fat binaries (Intel and PowerPC) since their objcopy version was crippled (there doesn't even seem to be an objcopy anymore on later OS X versions). Thus, renaming symbols after compilation is non-portable and sometimes not possible without writing your own tool. So, I propose to revert to renaming the symbols in the source code which could probably be done automatically using a lot of CPP #defines, starting from some sort of source code tar ball of gmp. This would also allow to always use the latest gmp sources without too much hassle. My 2p, Axel

Hi, Am Mittwoch, den 04.01.2012, 20:50 +0100 schrieb Axel Simon:
On 04.01.2012, at 17:50, Joachim Breitner wrote:
BTW, Is there a way to get the linker to create two independent copies of a library in one program space? Maybe if it is compiled as PIC (random name dropping here)? That would seem to be an elegant solution, as it makes the distro packers happy and you would not have to maintain a code copy.
In the past, I've linked a C++ library that used gmpxx against a Haskell program by renaming all symbols starting with gmp to mygmp using objcopy. Unfortunately, this is not portable and completely broke down on Mac OS when Apple moved to fat binaries (Intel and PowerPC) since their objcopy version was crippled (there doesn't even seem to be an objcopy anymore on later OS X versions). Thus, renaming symbols after compilation is non-portable and sometimes not possible without writing your own tool.
So, I propose to revert to renaming the symbols in the source code which could probably be done automatically using a lot of CPP #defines, starting from some sort of source code tar ball of gmp. This would also allow to always use the latest gmp sources without too much hassle.
Just to more random ideas that can probably easily dismissed by more knowledgeable people: Would linking gmp statically help? E.g. is there a way to link libgmp into the RTS that the symbols are not visible to the linker any more? And would dlopen make a difference? RTLD_LOCAL sounds interesting... Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

On 04/01/2012 21:00, Joachim Breitner wrote:
Hi,
Am Mittwoch, den 04.01.2012, 20:50 +0100 schrieb Axel Simon:
On 04.01.2012, at 17:50, Joachim Breitner wrote:
BTW, Is there a way to get the linker to create two independent copies of a library in one program space? Maybe if it is compiled as PIC (random name dropping here)? That would seem to be an elegant solution, as it makes the distro packers happy and you would not have to maintain a code copy.
In the past, I've linked a C++ library that used gmpxx against a Haskell program by renaming all symbols starting with gmp to mygmp using objcopy. Unfortunately, this is not portable and completely broke down on Mac OS when Apple moved to fat binaries (Intel and PowerPC) since their objcopy version was crippled (there doesn't even seem to be an objcopy anymore on later OS X versions). Thus, renaming symbols after compilation is non-portable and sometimes not possible without writing your own tool.
So, I propose to revert to renaming the symbols in the source code which could probably be done automatically using a lot of CPP #defines, starting from some sort of source code tar ball of gmp. This would also allow to always use the latest gmp sources without too much hassle.
Just to more random ideas that can probably easily dismissed by more knowledgeable people:
Would linking gmp statically help? E.g. is there a way to link libgmp into the RTS that the symbols are not visible to the linker any more?
Linking a static copy of GMP into the integer-gmp package is an interesting idea that I hadn't considered before. It ought to be possible, but I haven't played around with it. You would probably have to resolve all the symbols between the integer-gmp code and GMP itself when building the library, and don't expose any GMP symbols in the resulting .a file. That might mean making a big .o file containing GMP and the integer-gmp code that refers to it. I expect there would be problems with shared libraries though. You can't link the static GMP into the shared integer-gmp, because the static GMP isn't built with -fPIC.
And would dlopen make a difference? RTLD_LOCAL sounds interesting...
Maybe, I haven't looked into that. Cheers, Simon
Greetings, Joachim
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Thu, Jan 5, 2012 at 09:37, Simon Marlow
On 04/01/2012 21:00, Joachim Breitner wrote:
And would dlopen make a difference? RTLD_LOCAL sounds interesting...
Maybe, I haven't looked into that.
Beware of platform issues; IIRC RTLD_LOCAL doesn't do what one expects on Alphas. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On 04/01/2012 21:00, Joachim Breitner wrote:
Would linking gmp statically help? E.g. is there a way to link libgmp into the RTS that the symbols are not visible to the linker any more?
It has slightly more licensing complications - GMP is LGPL, which requires that the user of a program that includes it be able to replace GMP. Shared libraries do that easily. Static linking would probably require someone who compiles+distributes a program that uses both Haskell and GMP/MPFR to distribute their .o files (or alternatively their source code), so that they can be re-linked to a different GMP. (Their program still does *not* have to be LGPLed.) The parties I've heard publicly concerned about duplicate packages are Linux/BSD distros that primarily support Free Software. So that complication might not even be a problem (provided the GHC user would get a choice whether to staticly link, and probably default to dynamic linking). ~Isaac

Hi, Am Mittwoch, den 04.01.2012, 22:00 +0100 schrieb Joachim Breitner:
And would dlopen make a difference? RTLD_LOCAL sounds interesting...
it seems that some OSs provide a RTLD_PRIVATE which does exactly what we need: http://uw714doc.sco.com/en/man/html.3C/dlopen.3C.html But unfortunately, glibc does not seem to support it. But still, RTLD_LOCAL might be enough: RTLD_LOCAL This is the converse of RTLD_GLOBAL, and the default if nei‐ ther flag is specified. Symbols defined in this library are not made available to resolve references in subsequently loaded libraries. Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

Hi, Am Mittwoch, den 04.01.2012, 20:50 +0100 schrieb Axel Simon:
On 04.01.2012, at 17:50, Joachim Breitner wrote:
BTW, Is there a way to get the linker to create two independent copies of a library in one program space? Maybe if it is compiled as PIC (random name dropping here)? That would seem to be an elegant solution, as it makes the distro packers happy and you would not have to maintain a code copy.
In the past, I've linked a C++ library that used gmpxx against a Haskell program by renaming all symbols starting with gmp to mygmp using objcopy. Unfortunately, this is not portable and completely broke down on Mac OS when Apple moved to fat binaries (Intel and PowerPC) since their objcopy version was crippled (there doesn't even seem to be an objcopy anymore on later OS X versions). Thus, renaming symbols after compilation is non-portable and sometimes not possible without writing your own tool.
let me pick up this idea again. It was pointed out that it is mostly Linux and BSD based distributions that dislike code copies. For these, the objcopy-way might work. It does not work on MacOS, but that target does not have the code copy requirements.. So would it be possible to * use objcopy at GHC build-time to take the system libgmp shared library, rename the symbols, and install the modified library with ghc on architectures that support it, * have a code copy for those who don’t? For the distros this has the nice effect that if there is a bugfix in libgmp, they just have to rebuild ghc without any changes to the sources to benefit from it. (But I am really wondering why the linker cannot do something that has the same effect as objcopy --prefix-symbols, but on the fly.) Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

On Thu, Jan 5, 2012 at 13:53, Joachim Breitner
(But I am really wondering why the linker cannot do something that has the same effect as objcopy --prefix-symbols, but on the fly.)
Some of them can; notably the binutils ld, which comes from the same source as and uses the same mechanism as objcopy. The linker on OS X doesn't have this particular mechanism (although I think it has other features that could be used toward the same end, if not in the same way), and binutils ld does not work properly on OS X (nor does objcopy, again because it's the same mechanisms; the BFD library doesn't support Mach-O properly. This is probably why Apple at first shipped a crippled version and later removed it entirely. You can install a full set from MacPorts, but it breaks a *lot* of stuff...). And Solaris's ld has still another way to do this kind of thing, IIRC. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Hi, Am Donnerstag, den 05.01.2012, 14:14 -0500 schrieb Brandon Allbery:
On Thu, Jan 5, 2012 at 13:53, Joachim Breitner
wrote: (But I am really wondering why the linker cannot do something that has the same effect as objcopy --prefix-symbols, but on the fly.) Some of them can; notably the binutils ld, which comes from the same source as and uses the same mechanism as objcopy.
they do? I couldn’t any flag in that direction. I just tried a few combinations with a simple testprogram (git repo attached, every commit is one state). Unfortunately, using dlopen does not not help if some other shared library loads gmp regularly; there is still only one global state. And statically linking gmp into the share object does not work, as Simon predicted, because the static gmp library is not compiled with PIC. The promising ld option "-Bgroup" does not seem to have the desired effect. If only dlopen had RTDL_PRIVATE support... Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

On Thu, Jan 5, 2012 at 16:15, Joachim Breitner
Am Donnerstag, den 05.01.2012, 14:14 -0500 schrieb Brandon Allbery:
On Thu, Jan 5, 2012 at 13:53, Joachim Breitner
wrote: (But I am really wondering why the linker cannot do something that has the same effect as objcopy --prefix-symbols, but on the fly.) Some of them can; notably the binutils ld, which comes from the same source as and uses the same mechanism as objcopy.
they do? I couldn’t any flag in that direction.
Hrm. I thought it did. Possibly it requires an ld-script, or I'm confusing it with Solaris ld. In any case, I am starting to approach the point of "so will Debian allow ghc to remain compatible with non-Linux?", since so far I'm getting the distinct impression that solutions that work on Linux are all that matter. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Hi, Am Donnerstag, den 05.01.2012, 16:21 -0500 schrieb Brandon Allbery:
In any case, I am starting to approach the point of "so will Debian allow ghc to remain compatible with non-Linux?", since so far I'm getting the distinct impression that solutions that work on Linux are all that matter.
no, not at all. But I was under the impression that code copies will be fine for systems besides Linux and BSDs, and we are looking for a different solution especially for those. Anyways, I’m just brainstorming a bit, although it seems I’m only turning up ideas that have been dismissed already... Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

On Thu, Jan 5, 2012 at 5:44 PM, Joachim Breitner
Anyways, I’m just brainstorming a bit, although it seems I’m only turning up ideas that have been dismissed already...
It is good to hash through these things. =) On a related note, which doesn't solve the general problem, I have been working with Dan Peebles on a nice set of MPFR bindings for my own purposes using custom foreign prims. I unpacked the main MPFR structures and let the ghc garbage collector move everything around, like it does with GMP itself. We've been able to get a version that ALMOST works as expected. That is to say it works perfectly unless you need to access a function that pulls from its built-in constant cache. (MPFR internally caches the result of computing the first n digits of pi or log of 2, etc. growing the cache as you demand longer numbers.) Dan was able to swap out the ghc gmp allocation hook for a slightly slower one that checks to see if it is being called from the MPFR cache management function, and diverting the allocation back to malloc. Ideally we would swap the handler in an initializer when the library loads, and this works perfectly in ghc, but not ghci -- which apparently links in libraries in a way where c++ style initializers don't get invoked? A 'replaceAllocator' IO action that swaps the gmp allocation hook isn't a very Haskelly solution either, because I'd prefer to just have it look like another numeric type. Dan is currently investigating including a patched copy of MPFR in the haskell package, which doesn't try to use the built-in allocator for the constant cache. This is where it becomes relevant to the discussion at hand, because it would effectively involve linking in our own copy of MPFR, making distributions unhappy. But another option would be to unsafePerformIO that initializer, which would add a bit of overhead, going through an indirection to make sure that replaceAllocator had been forced, perhaps it wouldn't be too bad: Something like: replaceAllocator :: () replaceAllocator = unsafePerformIO replaceAllocatorIO {-# NOINLINE replaceAllocator #-} instance (Rounding r, Precision p) => Floating (Fixed r p) where pi = replaceAllocator `pseq` mpfr_pi ... This doesn't address general purpose use of third party libraries that happen to internally rely upon GMP, however. Or rather, if it does, the methodology it would lead to would be one of bringing over all of their internals into Haskell. ;) -Edward Kmett

On Wed, Jan 04, 2012 at 12:31:23PM +0000, Joachim Breitner wrote:
One potential problem is that some Linux distributions really don't like it if you bundle modified versions of external libraries. However, I just don't see a way around this: [...] [...] I guess this means me... Indeed Debian has the policy to avoid modified bundled libraries, if somehow possible. For example, we patch the build system to use the system-provided libffi.
This policy isn't even specific to linux distributions ;-) I don't know about the package building infrastructure for debian or fedora, but for openbsd (where i'm doing a lot of haskell stuff), it would be enough if the ghc sources would include not only a (patched or unpatched) gmp source tree but also the ghc-specific patches to gmp. The rationale behind this polcicy (for openbsd, i can't speak for debian): if there are 42 packages where the source distribution files contain their own (probably patched) version of gmp, and suddenly a critical patch has to be applied to gmp, we would have to apply it 43 times (for gmp itself and for all the 42 packages using a bundled gmp). If the source distribution files contained diffs for gmp, we could (at least try to) extract our patched gmp and apply the diff on top of it. => less work, any openbsd-specific patch automatically will be applied to all 42 packages. Ciao, Kili

* Simon Marlow:
One potential problem is that some Linux distributions really don't like it if you bundle modified versions of external libraries. However, I just don't see a way around this: GMP is inherently broken because it has global state, so if you want two use it from two clients in the same program, you need two copies of it.
Is this about the allocation functions? You could use the mpn functions (instead of the mpz variants), where you need to supply your own memory regions. It's not entirely straightforward because you need to calculate the expected lengths, manage the sign bit, and ensure that a few preconditions hold. The only thing which appears to be really missing is modular exponentiation (mpz_powm and mpz_powm_ui), but GHC doesn't seem to export them (huh?). Sure, it's quite a bit of work, but I expect it's more portable than hairy linker tricks.
participants (9)
-
Axel Simon
-
Brandon Allbery
-
Edward Kmett
-
Florian Weimer
-
Isaac Dupree
-
Joachim Breitner
-
Matthias Kilian
-
Michal Konečný
-
Simon Marlow