[GHC] #8647: Reduce allocations in `integer-gmp`

#8647: Reduce allocations in `integer-gmp` ------------------------------+-------------------------------------------- Reporter: hvr | Owner: Type: task | Status: new Priority: normal | Milestone: 7.8.1 Component: libraries | Version: 7.6.3 (other) | Operating System: Unknown/Multiple Keywords: | Type of failure: Runtime performance bug Architecture: x86_64 | Test Case: (amd64) | Blocking: Difficulty: Unknown | Blocked By: | Related Tickets: #8638 | ------------------------------+-------------------------------------------- I've added `printf(3)`s to `integer-gmp`s GMP allocation/reallocation functions, and noticed there to a significant amount of allocations going on. This is due to the fact that the CMM wrappers call `gmpz_init()` to initialize the result, and this already allocates a 1-limb `StgArray`. But the actual GMP operations perform a reallocate-call (which by contract also needs `memcpy` the untouched limb over to the newly allocated structure) based on the type of operation and the involved operands, causing the optimistically allocated 1-limb structure to become garbage right away w/o even being written to. I've been able to get rid of most such superfluous allocations by replacing {{{#!c ccall __gmpz_init(mp_result1 "ptr"); }}} by {{{#!c MP_INT__mp_alloc(mp_result1) = 0; MP_INT__mp_size(mp_result1) = 0; MP_INT__mp_d(mp_result1) = 0; // (1) }}} which is handled properly by all GMP operations we make use of currently. This elegantly lets the very first allocation be performed within the GMP operation (which has better knowledge with respect to how many limbs the result will require) I've got working proof-of-concept code, which reduces the allocations the big-num heavy `nofib` programs (I've omitted the benchmarks which had a 0% allocation change): {{{ -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- ... bernouilli -0.1% -5.2% 0.12 0.12 +0.0% kahan -0.1% -10.5% 0.17 0.17 +0.0% primetest -0.1% -12.8% 0.07 0.07 +0.0% rsa -0.1% -13.8% 0.02 0.02 +0.0% symalg -0.1% -2.3% 0.01 0.01 +0.0% ... -------------------------------------------------------------------------------- Min -0.1% -13.8% -3.1% -3.1% -6.5% Max -0.0% +0.4% +4.0% +4.0% +1.5% Geometric Mean -0.1% -0.5% +0.5% +0.4% -0.1% }}} `(1)` This should actually point to a proper static closure I didn't need this for the proof-of-concept ---- Another place which helps avoiding temporarily allocating 1-limb `StgArrays` which live only for the duration of GMP FFI calls are those caused by the `toBig`-casts, which I'm also experimenting by making use of GMP's big-int/small-int add/sub/mul primitives (the deltas shown above are actually measured on top of this optimization), here's what to expect from such an optimization by itself (i.e. w/o the realloc-optimization from above): {{{ bernouilli +0.1% -4.2% 0.12 0.12 +0.0% kahan +0.1% -12.6% 0.17 0.17 +0.0% power +0.1% -2.7% +2.2% +2.2% +0.0% rsa +0.1% -4.1% 0.02 0.02 +0.0% scs +0.1% -2.6% -1.6% -1.6% +0.0% }}} Thus, the `kahan` benchmark benefits from this optimization by -12.6%, and on top of that another -10.5% by also avoiding the GMP-reallocations. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp` --------------------------------------------+------------------------------ Reporter: hvr | Owner: Type: task | Status: new Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.6.3 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: x86_64 Test Case: | (amd64) Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8638 --------------------------------------------+------------------------------ Changes (by hvr): * keywords: => integer-gmp -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp` --------------------------------------------+------------------------------ Reporter: hvr | Owner: Type: task | Status: new Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.6.3 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: x86_64 Test Case: | (amd64) Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8638 --------------------------------------------+------------------------------ Comment (by simonpj): That looks like a very worthwhile improvement. Please do make sure that you document the new code carefully. By definition, it wasn't obvious to whoever first wrote it! Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: new
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: new
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp` --------------------------------------------+------------------------------ Reporter: hvr | Owner: Type: task | Status: new Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.6.3 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: x86_64 Test Case: | (amd64) Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8638 --------------------------------------------+------------------------------ Comment (by hvr): The two commits above implement the 2nd of the two proposed optimizations in the ticket description, the avoid-reallocations-optimization is a bit more difficult to get implemented properly (but I'm still working on it). Btw, this optimization helps also in cases, when several small-ints are added/multiplied to an accumulator as in e.g.: {{{#!hs sum (toInteger (maxBound :: Int) : [1..10000]) -- or product [1..100] }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp` --------------------------------------------+------------------------------ Reporter: hvr | Owner: Type: task | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.6.3 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: x86_64 Test Case: | (amd64) Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8638 --------------------------------------------+------------------------------ Changes (by hvr): * status: new => patch Comment: I attached the patch instead of pushing to Git, as I'm not satisfied yet with it. It's works stable here (passes `validate` & `nofib`), but I'm not sure about the `unsafeCoerce#`. In my first version of this patch, I used a 3-tuple `(# Int#, ByteArray#, Word# #)` as return value, but that wasted 1 return register per `Integer` result, and still had to return a NULL pointer for `ByteArray#` for the case it wasn't used. I'm wondering if it would be feasible and sensible to return a properly constructed `Integer` value from the Cmm wrappers (i.e. to fold `tupToInteger` into gmp-`wrappers.cmm`), but I haven't figured out yet how to implement that in Cmm. Moreover, I'm also still wondering how to properly allocate stack-space in "high-level" Cmm procedures (see [http://www.haskell.org/pipermail/ghc- devs/2014-January/003616.html posting on ghc-devs@]), as the current code seems to work properly only as long the Cmm procedures don't force the code generator to save registers to the stack. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I attached the patch instead of pushing to Git, as I'm not satisfied yet with it. It's works stable here (passes `validate` & `nofib`), but I'm not sure about the `unsafeCoerce#`.
In my first version of this patch, I used a 3-tuple `(# Int#, ByteArray#, Word# #)` as return value, but that wasted 1 return register
#8647: Reduce allocations in `integer-gmp` --------------------------------------------+------------------------------ Reporter: hvr | Owner: Type: task | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.6.3 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: x86_64 Test Case: | (amd64) Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8638 --------------------------------------------+------------------------------ Comment (by hvr): Replying to [comment:6 hvr]: per `Integer` result, and still had to return a NULL pointer for `ByteArray#` for the case it wasn't used. After a conversation with Duncan, I'm now convinced that the `(# Int#, ByteArray# #)` + `unsafeCoerce#` hack is really bad, as the GC could try to follow the `ByteArray#` pointer even though it was really just a `Word#` in disguise. Otoh, passing a 0-pointer for the 3-tuple variant has the same danger. So now I'm left wondering if I can somehow statically allocate a dummy `ByteArray#` I can return in Cmm when there's no need to allocate a proper `ByteArray#` in order to avoid confusing the GC. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp` --------------------------------------------+------------------------------ Reporter: hvr | Owner: Type: task | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.6.3 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: x86_64 Test Case: | (amd64) Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8638 --------------------------------------------+------------------------------ Comment (by hvr): Replying to [comment:7 hvr]:
Otoh, passing a 0-pointer for the 3-tuple variant has the same danger.
So now I'm left wondering if I can somehow statically allocate a dummy `ByteArray#` I can return in Cmm when there's no need to allocate a proper `ByteArray#` in order to avoid confusing the GC.
I've workarounded this by returning a pointer to one of the statically allocated INTLIKE `Int`s, (which as far as I understand it should correspond to `unsafeCoerce#`ing an `Int` into a `ByteArray#`) which should be somewhat more polite to the garbage collector than coercing a non-pointer `0## :: Word#` into a `ByteArray#`. The latest patch for review with more details in its commit message can be found at [20d7bfdd29917f5a8b8937fba9b724f7e71cd8dd/integer-gmp]. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp` --------------------------------------------+------------------------------ Reporter: hvr | Owner: Type: task | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.6.3 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: x86_64 Test Case: | (amd64) Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8638 --------------------------------------------+------------------------------ Comment (by simonpj): That looks better but would need COPIOUS comments. The ice is thin whenever you use unsafeCoerce. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp`
--------------------------------------------+------------------------------
Reporter: hvr | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.6.3
Resolution: | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture: x86_64
Test Case: | (amd64)
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8638
--------------------------------------------+------------------------------
Comment (by Herbert Valerio Riedel

#8647: Reduce allocations in `integer-gmp` --------------------------------------------+------------------------------ Reporter: hvr | Owner: Type: task | Status: new Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.6.3 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: x86_64 Test Case: | (amd64) Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8638 --------------------------------------------+------------------------------ Changes (by thoughtpolice): * status: patch => new Comment: This work is basically completed for 7.8, but Herbert will do more. At the moment this doesn't need to be in patch status. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp` -------------------------------------+------------------------------------- Reporter: hvr | Owner: ekmett Type: task | Status: closed Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.6.3 Libraries | Keywords: integer-gmp Resolution: fixed | Architecture: x86_64 (amd64) Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: #8638 performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by thomie): * cc: core-libraries-committee@… (added) * status: new => closed * resolution: => fixed Comment: Since we're using `integer-gmp2` now (#9281), I'm assuming this can be closed. Please re-open if I'm mistaken, of course. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8647: Reduce allocations in `integer-gmp` -------------------------------------+------------------------------------- Reporter: hvr | Owner: ekmett Type: task | Status: closed Priority: normal | Milestone: 7.8.1 Component: Core | Version: 7.6.3 Libraries | Keywords: integer-gmp Resolution: fixed | Architecture: x86_64 (amd64) Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: #8638 performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by hvr): * milestone: 7.10.1 => 7.8.1 Comment: Yeah, this has been effectively superseded by #9281 (which ironically took two steps forward and one step back... but that's still being worked on) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8647#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC