possible tuple elimination optimization?

The following higher order function, sumh, seems to be 3 to 14 times slower than the equivalent recursive function, sumr: sumh :: Double -> Double sumh n = snd $ until ((>= n) . fst) as' (1, 0) where as' (i,s) = (i + 2, s + (-1) / i + 1 / (i + 1)) sumr :: Double -> Double sumr n = as' 1 0 where as' i s | i >= n = s | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1)) This is true in 7.10.3 as well as 8.0.1 so this is not a regression. From the size usage my guess is that this is due to the allocation of tuples in sumh. Maybe there is a straightforward way to optimize sumh but I couldn't find it. Adding a Strict pragma didn't help nor did use of -funbox-strict-fields -flate-dmd-anal. Have I missed something or should I file a bug? Timings from 8.0.1 rc2: ghc --version The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204 bash-3.2$ ghc -O2 -dynamic sum.hs ghc -O2 -dynamic sum.hs [1 of 1] Compiling Main ( sum.hs, sum.o ) Linking sum ... bash-3.2$ ghci Prelude> :load sum Ok, modules loaded: Main. (0.05 secs,) Prelude Main> *sumh* (10^6) -0.6931466805602525 it :: Double (*0.14 secs, 40,708,016 bytes*) Prelude Main> sumr (10^6) -0.6931466805602525 it :: Double (*0.01 secs, 92,000 bytes)* Thanks George

The extra allocation/time isn't caused by allocating pairs (neither
version does this). It is caused by allocating Doubles, because GHC
doesn't figure out that the second part of your pairs is always
demanded (I think), so the corresponding argument is left boxed. sumh
essentially turns into:
sumh (D# d#) = sumh1 d#
sumh1 d# = case loop1 1# 0# of
(# _, ans #) -> ans
where
loop1 :: Double# -> Double# -> (# Double, Double #)
loop1 i# s#
| i# >= d# = (# D# i#, D# s# #)
| otherwise = loop2 (i# + 2#) (D# ...)
loop2 :: Double# -> Double -> (# Double, Double #)
loop2 i# s
| i# >= d# = (# D# i#, s #)
| otherwise = loop2 (i# + 2#) (...)
So, it starts as a loop on unboxed arguments for one step, but then
switches over to boxing the second argument after the first iteration.
If you consider loop2 in isolation (and the hypothetical loop that
both of these came from), it is not strict in s. So there is no reason
to keep s evaluated except by considering the calling context, which I
guess GHC is not doing.
One way to fix this is by modifying the predicate to make is stricter.
Instead of `((>=n) . fst)` you can use:
p (i, s) = s `seq` (i >= n)
Then it will be obvious to GHC that it shouldn't delay computing `s`
at each step.
This may be a similar failure to the old 'constructed product result
inside unboxed tuples' bug, where a function with an `ST s Int` result
could internally be optimized to return an unboxed integer, but it
isn't. Your example isn't exactly the same, but it has a similar
flavor. A call site is strict in part of the result of a function, and
generating code for that strict case would enable better optimization
of the callee, but the optimization would have to act on individual
parts of an unboxed tuple (which in your case comes from optimizing a
tuple), and that isn't done.
-- Dan
On Sat, Mar 26, 2016 at 1:50 PM, George Colpitts
The following higher order function, sumh, seems to be 3 to 14 times slower than the equivalent recursive function, sumr:
sumh :: Double -> Double sumh n = snd $ until ((>= n) . fst) as' (1, 0) where as' (i,s) = (i + 2, s + (-1) / i + 1 / (i + 1))
sumr :: Double -> Double sumr n = as' 1 0 where as' i s | i >= n = s | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))
This is true in 7.10.3 as well as 8.0.1 so this is not a regression. From the size usage my guess is that this is due to the allocation of tuples in sumh. Maybe there is a straightforward way to optimize sumh but I couldn't find it. Adding a Strict pragma didn't help nor did use of -funbox-strict-fields -flate-dmd-anal. Have I missed something or should I file a bug?
Timings from 8.0.1 rc2:
ghc --version The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204 bash-3.2$ ghc -O2 -dynamic sum.hs ghc -O2 -dynamic sum.hs [1 of 1] Compiling Main ( sum.hs, sum.o ) Linking sum ... bash-3.2$ ghci Prelude> :load sum Ok, modules loaded: Main. (0.05 secs,) Prelude Main> sumh (10^6) -0.6931466805602525 it :: Double (0.14 secs, 40,708,016 bytes) Prelude Main> sumr (10^6) -0.6931466805602525 it :: Double (0.01 secs, 92,000 bytes)
Thanks George
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

By the way, in case this helps your mental model, if you modify sumr to be:
sumr n = snd $ as' 1 0
where
as' i s
| i >= n = (i, s)
| otherwise = ...
Then it has the same problem as sumh. Your original as' for sumr is
strict in s, but this modified one isn't.
This shows another way to fix sumh, too. Create a version of until
that separates out the part of the state that is only for testing.
Then the until loop will be strict in the result part of the state,
and the desired optimizations will happen (in this case):
until' p step = go
where
go t r
| p t = r
| otherwise = uncurry go $ step (t, r)
-- Dan
On Sat, Mar 26, 2016 at 1:50 PM, George Colpitts
The following higher order function, sumh, seems to be 3 to 14 times slower than the equivalent recursive function, sumr:
sumh :: Double -> Double sumh n = snd $ until ((>= n) . fst) as' (1, 0) where as' (i,s) = (i + 2, s + (-1) / i + 1 / (i + 1))
sumr :: Double -> Double sumr n = as' 1 0 where as' i s | i >= n = s | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))
This is true in 7.10.3 as well as 8.0.1 so this is not a regression. From the size usage my guess is that this is due to the allocation of tuples in sumh. Maybe there is a straightforward way to optimize sumh but I couldn't find it. Adding a Strict pragma didn't help nor did use of -funbox-strict-fields -flate-dmd-anal. Have I missed something or should I file a bug?
Timings from 8.0.1 rc2:
ghc --version The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204 bash-3.2$ ghc -O2 -dynamic sum.hs ghc -O2 -dynamic sum.hs [1 of 1] Compiling Main ( sum.hs, sum.o ) Linking sum ... bash-3.2$ ghci Prelude> :load sum Ok, modules loaded: Main. (0.05 secs,) Prelude Main> sumh (10^6) -0.6931466805602525 it :: Double (0.14 secs, 40,708,016 bytes) Prelude Main> sumr (10^6) -0.6931466805602525 it :: Double (0.01 secs, 92,000 bytes)
Thanks George
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Yes: see https://ghc.haskell.org/trac/ghc/wiki/NestedCPR
Simon
| -----Original Message-----
| From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Dan Doel
| Sent: 26 March 2016 20:41
| To: George Colpitts
participants (3)
-
Dan Doel
-
George Colpitts
-
Simon Peyton Jones