[GHC] #8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#)

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) ------------------------------+-------------------------------------------- Reporter: hvr | Owner: hvr Type: feature | Status: new request | Milestone: 7.8.1 Priority: normal | Version: 7.7 Component: libraries | Operating System: Unknown/Multiple (other) | Type of failure: Runtime performance bug Keywords: integer- | Test Case: gmp | Blocking: Architecture: | Unknown/Multiple | Difficulty: Unknown | Blocked By: | Related Tickets: | ------------------------------+-------------------------------------------- In the course of a recent [discussion on reddit](http://www.reddit.com/r/haskell/comments/1twtvm/the_problem_with_integer/) it was highlighted, that `integer-gmp` doesn't try to demote `J#` to the more efficient `S#` if even though they would fit. The attached proof-of-concept patch introduces a "smart" `J#` constructor which constructs a `S#` instead if possible: {{{ #!hs -- | Demote 'J#' to 'S#' if possible. See also 'smartJ#'. toSmall :: Integer -> Integer toSmall i@(S# _) = i toSmall (J# 0# _) = S# 0# toSmall (J# 1# mb#) | isTrue# (v ># 0#) = S# v where v = indexIntArray# mb# 0# toSmall (J# -1# mb#) | isTrue# (v <# 0#) = S# v where v = negateInt# (indexIntArray# mb# 0#) toSmall i = i -- | Smart 'J#' constructor which tries to construct 'S#' if possible smartJ# :: Int# -> ByteArray# -> Integer smartJ# s# mb# = toSmall (J# s# mb#) }}} And replaces a couple of `J#`-constructions which are likely to produce a `S#`-fitting `Integer`. A `nofib` comparison for vanilla GHC HEAD vs patched GHC HEAD is attached for further discussion. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: new Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Description changed by hvr: Old description:
In the course of a recent [discussion on reddit](http://www.reddit.com/r/haskell/comments/1twtvm/the_problem_with_integer/) it was highlighted, that `integer-gmp` doesn't try to demote `J#` to the more efficient `S#` if even though they would fit.
The attached proof-of-concept patch introduces a "smart" `J#` constructor which constructs a `S#` instead if possible:
{{{ #!hs -- | Demote 'J#' to 'S#' if possible. See also 'smartJ#'. toSmall :: Integer -> Integer toSmall i@(S# _) = i toSmall (J# 0# _) = S# 0# toSmall (J# 1# mb#) | isTrue# (v ># 0#) = S# v where v = indexIntArray# mb# 0# toSmall (J# -1# mb#) | isTrue# (v <# 0#) = S# v where v = negateInt# (indexIntArray# mb# 0#) toSmall i = i
-- | Smart 'J#' constructor which tries to construct 'S#' if possible smartJ# :: Int# -> ByteArray# -> Integer smartJ# s# mb# = toSmall (J# s# mb#) }}}
And replaces a couple of `J#`-constructions which are likely to produce a `S#`-fitting `Integer`. A `nofib` comparison for vanilla GHC HEAD vs patched GHC HEAD is attached for further discussion.
New description: In the course of a recent [[http://www.reddit.com/r/haskell/comments/1twtvm/the_problem_with_integer/|recent reddit discussion]] it was highlighted, that `integer-gmp` doesn't try to demote `J#` result-values to the more efficient `S#` even though they would fit into a machine word. The attached proof-of-concept patch introduces a "smart" `J#` constructor which constructs a `S#` value instead (if possible): {{{ #!hs -- | Demote 'J#' to 'S#' if possible. See also 'smartJ#'. toSmall :: Integer -> Integer toSmall i@(S# _) = i toSmall (J# 0# _) = S# 0# toSmall (J# 1# mb#) | isTrue# (v ># 0#) = S# v where v = indexIntArray# mb# 0# toSmall (J# -1# mb#) | isTrue# (v <# 0#) = S# v where v = negateInt# (indexIntArray# mb# 0#) toSmall i = i -- | Smart 'J#' constructor which tries to construct 'S#' if possible smartJ# :: Int# -> ByteArray# -> Integer smartJ# s# mb# = toSmall (J# s# mb#) }}} The patch replaces a couple of `J#`-invocations which are likely to produce a `S#`-fitting `Integer`. A `nofib` comparison for vanilla GHC HEAD vs. patched GHC HEAD is attached for further discussion. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by hvr): * status: new => patch -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: infoneeded Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by simonpj): * status: patch => infoneeded Comment: Interesting. In general it seems like a good idea. Although making the test has a cost, if you get a win, it's a big win. I'd first like to know why `mandel` started allocating nearly 50% more store. (I use `-ticky` for answering this kind of question.) Maybe investigating that will show how the patch can be improved further. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: infoneeded Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by hvr): Replying to [comment:3 simonpj]:
I'd first like to know why `mandel` started allocating nearly 50% more store. (I use `-ticky` for answering this kind of question.) Maybe investigating that will show how the patch can be improved further.
I'm not sure yet why `mandel` allocates more as I couldn't see find any significant use of `Integer`s in the implementation. After some profiling I found out that swapping out the implementation of `magnitude` inside `Mandel.diverge` with a more naive one had a ''huge'' effect on the allocation. Now the report reads: {{{ -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- ... maillist +0.0% +0.0% 0.04 0.04 +2.2% mandel -0.2% -45.4% 0.02 0.02 +0.0% mandel2 +0.0% +0.0% 0.00 0.00 +0.0% ... -------------------------------------------------------------------------------- Min -0.2% -45.4% -32.3% -32.3% -20.0% Max +0.1% +7.2% +1.6% +1.6% +23.1% Geometric Mean +0.1% -1.3% -1.7% -1.8% +0.0% }}} The modification I made was simply {{{ #!diff diff --git a/spectral/mandel/Mandel.lhs b/spectral/mandel/Mandel.lhs index 3f460ce..cc601f5 100644 --- a/spectral/mandel/Mandel.lhs +++ b/spectral/mandel/Mandel.lhs @@ -109,6 +109,8 @@ is the prelude function that calculates the euclidean norm diverge::Complex Double -> Double -> Bool diverge cmplx radius = magnitude cmplx > radius + where + magnitude (x :+ y) = sqrt (x*x + y*y) \end{code} }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: infoneeded Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by hvr): Replying to [comment:4 hvr]:
I'm not sure yet why `mandel` allocates more as I couldn't see find any significant use of `Integer`s in the implementation. After some profiling I found out that swapping out the implementation of `magnitude` inside `Mandel.diverge` with a more naive one had a ''huge'' effect on the allocation.
Ok, found the culprit; `magnitude` uses `scaleFloat` which in turn uses `decodeFloat`/`encodeFloat`. I removed the `smartJ#` I put into `decodeFloat` and the allocation-delta for the `mandel`-benchmark went back to more or less 0. As the main purpose of `decodeFloat` seems to be to be used in combination with `encodeFloat` I guess there's actually little benefit trying to demote the significant to `S#` anyway... I'll attach the new nofib report shortly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by hvr): * status: infoneeded => patch -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonpj): Great. Now, what's happening in `symalg` and `kahan` (+4.7 and 6.7% resp)? Working on these small deltas seems tiresome, but what is a small delta in one program can be a massive one in another. It's not that we must have uniform improvement; just that we like to know ''why'' something gets worse, and deem it an acceptable compromise. What is puzzling to me is that I don't think `smartJ#` does ''any'' allocation (apart from its result) so I don't see why allocation should ever increase. Thanks Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: infoneeded Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by simonpj): * status: patch => infoneeded -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: infoneeded Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by hvr): Replying to [comment:7 simonpj]:
Great. Now, what's happening in `symalg` and `kahan` (+4.7 and 6.7% resp)?
Thunks are happening... (see below)
What is puzzling to me is that I don't think `smartJ#` does ''any'' allocation (apart from its result) so I don't see why allocation should ever increase.
...because I missed the fact that `(# smartJ# ... , smartJ# ... #)` creates a thunk (there was even a source code comment about that pitfall in `quotRemInteger#`) However, I replaced all `smartJ#`-in-`(#,#)` occurences (incl. `decodeDouble`) by `let !x = smartJ# ... in (# x, ... #)` constructs, and the results look better now: {{{ Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- ... gamteb +0.1% -19.0% 0.03 0.03 +0.0% ... kahan +0.2% -1.2% 0.17 0.17 +0.0% ... mandel +0.1% -7.7% 0.05 0.05 +0.0% ... power +0.1% -40.8% -32.5% -32.5% +0.0% ... symalg +0.2% -0.5% 0.01 0.01 +0.0% ... -------------------------------------------------------------------------------- Min +0.0% -40.8% -32.5% -32.5% -5.1% Max +0.2% +0.1% +2.0% +2.0% +0.0% Geometric Mean +0.1% -1.0% -2.5% -2.5% -0.1% }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by hvr): * status: infoneeded => patch -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonpj): Terrific, thanks. I'll commit. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: patch Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8647 --------------------------------------------+------------------------------ Changes (by hvr): * related: => #8647 Comment: I've investigated some more how to optimize `integer-gmp`, see also #8647 The scary part is, that the `Integer` is involved in seemingly unrelated operations, such as operating on `Double` or `show`ing `Double` values, causing several avoidable allocations. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#)
--------------------------------------------+------------------------------
Reporter: hvr | Owner: hvr
Type: feature request | Status: closed
Priority: normal | Milestone: 7.8.1
Component: libraries (other) | Version: 7.7
Resolution: fixed | Keywords: integer-
Operating System: Unknown/Multiple | gmp
Type of failure: Runtime performance bug | Architecture:
Test Case: | Unknown/Multiple
Blocking: | Difficulty: Unknown
| Blocked By:
| Related Tickets: #8647
--------------------------------------------+------------------------------
Changes (by simonpj):
* status: patch => closed
* resolution: => fixed
Comment:
Thanks. I committed two patches below. I'll close this one since you
have opened a new ticket for #8647.
Simon
{{{
commit 301269aef0fb331bf272de4f6592eb71471a3b16
Author: Herbert Valerio Riedel
---------------------------------------------------------------
301269aef0fb331bf272de4f6592eb71471a3b16
GHC/Integer/Type.lhs | 92
++++++++++++++++++++++++++++++++------------------
1 file changed, 59 insertions(+), 33 deletions(-)
}}}
and
{{{
commit 3c93d7f61821345f29b9ee8a99346fa464d708a4
Author: Simon Peyton Jones
---------------------------------------------------------------
3c93d7f61821345f29b9ee8a99346fa464d708a4 GHC/Integer/Type.lhs | 50 +++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 41 insertions(+), 9 deletions(-) diff --git a/GHC/Integer/Type.lhs b/GHC/Integer/Type.lhs index c206462..77d529a 100644 --- a/GHC/Integer/Type.lhs +++ b/GHC/Integer/Type.lhs @@ -152,21 +152,52 @@ toBig i@(J# _ _) = i -- | Demote 'J#' to 'S#' if possible. See also 'smartJ#'. toSmall :: Integer -> Integer -toSmall i@(S# _) = i -toSmall (J# 0# _) = S# 0# -toSmall (J# 1# mb#) | isTrue# (v ># 0#) = S# v +toSmall i@(S# _) = i +toSmall (J# s# mb#) = smartJ# s# mb# + + +-- | Smart 'J#' constructor which tries to construct 'S#' if possible +smartJ# :: Int# -> ByteArray# -> Integer smartJ# 0# _ = S# 0# smartJ# +1# mb# | isTrue# (v ># 0#) = S# v where v = indexIntArray# mb# 0# -toSmall (J# -1# mb#) | isTrue# (v <# 0#) = S# v +smartJ# (-1#) mb# | isTrue# (v <# 0#) = S# v where v = negateInt# (indexIntArray# mb# 0#) -toSmall i = i - --- | Smart 'J#' constructor which tries to construct 'S#' if possible -smartJ# :: Int# -> ByteArray# -> Integer -smartJ# s# mb# = toSmall (J# s# mb#) +smartJ# s# mb# = J# s# mb# \end{code} +Note [Use S# if possible] +~~~~~~~~~~~~~~~~~~~~~~~~~ +It's a big win to use S#, rather than J#, whenever possible. Not only +does it take less space, but (probably more important) subsequent +operations are more efficient. See Trac #8638. + +'smartJ#' is the smart constructor for J# that performs the necessary +tests. When returning a nested result, we always use smartJ# strictly, +thus + let !r = smartJ# a b in (# r, somthing_else #) to avoid creating +a thunk that is subsequently evaluated to a J#. +smartJ# itself does a pretty small amount of work, so it's not worth +thunking it. + +We call 'smartJ#' in places like quotRemInteger where a big input might +produce a small output. + +Just using smartJ# in this way has good results: + + Program Size Allocs Runtime Elapsed TotalMem +-------------------------------------------------------------------------------- + gamteb +0.1% -19.0% 0.03 0.03 +0.0% + kahan +0.2% -1.2% 0.17 0.17 +0.0% + mandel +0.1% -7.7% 0.05 0.05 +0.0% + power +0.1% -40.8% -32.5% -32.5% +0.0% + symalg +0.2% -0.5% 0.01 0.01 +0.0% +-------------------------------------------------------------------------------- + Min +0.0% -40.8% -32.5% -32.5% -5.1% + Max +0.2% +0.1% +2.0% +2.0% +0.0% + Geometric Mean +0.1% -1.0% -2.5% -2.5% -0.1% %********************************************************* %* * @@ -200,6 +231,7 @@ quotRemInteger (J# s1 d1) (J# s2 d2) (# s3, d3, s4, d4 #) -> let !q = smartJ# s3 d3 !r = smartJ# s4 d4 in (# q, r #) + -- See Note [Use S# if possible] {-# NOINLINE divModInteger #-} divModInteger :: Integer -> Integer -> (# Integer, Integer #) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8638: Optimize by demoting "denormalized" Integers (i.e. J# -> S#) --------------------------------------------+------------------------------ Reporter: hvr | Owner: hvr Type: feature request | Status: closed Priority: normal | Milestone: 7.8.1 Component: libraries (other) | Version: 7.7 Resolution: fixed | Keywords: integer- Operating System: Unknown/Multiple | gmp Type of failure: Runtime performance bug | Architecture: Test Case: | Unknown/Multiple Blocking: | Difficulty: Unknown | Blocked By: | Related Tickets: #8647 --------------------------------------------+------------------------------ Comment (by hvr): Replying to [comment:13 simonpj]:
Thanks. I committed two patches below.
For convenience, here are Trac links for the two commits: * [301269aef0fb331bf272de4f6592eb71471a3b16/integer-gmp] * [3c93d7f61821345f29b9ee8a99346fa464d708a4/integer-gmp] -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8638#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC