Re:until, laziness, and performance

Dan, I really appreciate that you responded so quickly and took the time
to explain this. Unfortunately I still don't understand.
I did think of laziness as a potential source of bad performance but to me
both internal functions were lazy in the second argument. Since I couldn't
explain why both weren't equally slow due to laziness, I dismissed laziness
as a possible source of the problem instead of questioning my understanding
of laziness.
I still don't understand why
as'
i s
| i >= n = s
| otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))
is not lazy in its second arg as it only has to evaluate it if i >= n. This
seems exactly analogous to or, ||, which only needs to evaluate its second
arg if its first arg is False. || is definitely lazy in its second
argument. Can you explain why as' is strict in its second arg, s? Is it due
to the following from,
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand ?
For functions, it is assumed, that the result of the function is used
strictly.
I guess I should have checked my intuition against the compiler strictness
analysis. I modified sum.hs to only have a definition of sumr, commenting
out sumh and tried the various tools.
The user's guide,
https://downloads.haskell.org/~ghc/8.0.1-rc2/docs/html/users_guide/sooner.ht...,
recommends:
Look for your function in the interface file, then for the third field in
the pragma; it should say Strictness: ⟨string⟩. The ⟨string⟩ gives the
strictness of the function’s arguments: see the GHC Commentary for a
description of the strictness notation.
It doesn't mention that the interface file is binary and that you need to
use ghc --show-iface, e.g.
ghc --show-iface sum.hi > sum.hir
Doing this seems to give the wrong answer, i.e. the second arg is lazy.
$was' :: Double# -> Double# -> Double#
{- Arity: 2, Strictness: <*L*,U>, Inline: [0] -}
Is this a bug?
ghc -ddump-stranal -O sum.hs gives the right answer:
as'_s25G [Occ=LoopBreaker] :: Double -> Double -> Double
[LclId,
Arity=2,
CallArity=2,
Str=DmdType <*S*,1*U(U)>m {avM->},
Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [20 20] 108 0}]
Unfortunately the simplest way to find out doesn't give information on
internal functions. I will file an ER for this:
ghc -ddump-strsigs -O sum.hs
[1 of 1] Compiling Main ( sum.hs, sum.o )
==================== Strictness signatures ====================
:Main.main:
Main.$trModule: m
Main.main: x
Main.sumr: m
I really like your definition of until' as the solution to this. To me it
seems analogous to foldl', in that it provides a way to avoid an
optimization problem that may be subtle to many. I guess it is not general
enough to be added to a library. The following world be general enough but
it would probably be overkill:
until' :: NFData a => (a -> Bool) -> (a -> a) -> a -> a
until' p f = go
where
go x | p $!! x = x
| otherwise = go (f x)
Thanks again for all your help
George
On Sat, Mar 26, 2016 at 5:40 PM Dan Doel
By the way, in case this helps your mental model, if you modify sumr to be:
sumr n = snd $ as' 1 0 where as' i s | i >= n = (i, s) | otherwise = ...
Then it has the same problem as sumh. Your original as' for sumr is strict in s, but this modified one isn't.
This shows another way to fix sumh, too. Create a version of until that separates out the part of the state that is only for testing. Then the until loop will be strict in the result part of the state, and the desired optimizations will happen (in this case):
until' p step = go where go t r | p t = r | otherwise = uncurry go $ step (t, r)
-- Dan
On Sat, Mar 26, 2016 at 1:50 PM, George Colpitts
wrote: The following higher order function, sumh, seems to be 3 to 14 times slower than the equivalent recursive function, sumr:
sumh :: Double -> Double sumh n = snd $ until ((>= n) . fst) as' (1, 0) where as' (i,s) = (i + 2, s + (-1) / i + 1 / (i + 1))
sumr :: Double -> Double sumr n = as' 1 0 where as' i s | i >= n = s | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))
This is true in 7.10.3 as well as 8.0.1 so this is not a regression. From the size usage my guess is that this is due to the allocation of tuples in sumh. Maybe there is a straightforward way to optimize sumh but I couldn't find it. Adding a Strict pragma didn't help nor did use of -funbox-strict-fields -flate-dmd-anal. Have I missed something or should I file a bug?
Timings from 8.0.1 rc2:
ghc --version The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204 bash-3.2$ ghc -O2 -dynamic sum.hs ghc -O2 -dynamic sum.hs [1 of 1] Compiling Main ( sum.hs, sum.o ) Linking sum ... bash-3.2$ ghci Prelude> :load sum Ok, modules loaded: Main. (0.05 secs,) Prelude Main> sumh (10^6) -0.6931466805602525 it :: Double (0.14 secs, 40,708,016 bytes) Prelude Main> sumr (10^6) -0.6931466805602525 it :: Double (0.01 secs, 92,000 bytes)
Thanks George
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Sun, Mar 27, 2016 at 4:42 PM, George Colpitts
I still don't understand why
as' i s | i >= n = s | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))
is not lazy in its second arg as it only has to evaluate it if i >= n. This seems exactly analogous to or, ||, which only needs to evaluate its second arg if its first arg is False. || is definitely lazy in its second argument. Can you explain why as' is strict in its second arg, s? Is it due to the following from, https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand ?
The simplest way to see why as' is strict in s is to use the actual definition of strictness, and not think about operational semantics. The definition is that a function f is strict if `f undefined = undefined`. So, what is `as' i undefined`? If `i >= n` then `as' i undefined = undefined` if not `i >= n` then `as' i undefined = as' (i+2) undefined` The second case holds because Double arithmetic is strict, so the complex expression will be undefined because s is.
From these cases we can reason that the result will always be undefined. Either the value of i will grow such that the first case succeeds, or if that will never happen (if n is NaN for instance), the loop will continue forever, in which case the result is equivalent to undefined semantically.
Because of this denotational analysis, we are at liberty to make the operation of as' such that it always evaluates s immediately, because it won't change the answer. And that is the optimization that saves time/space. By contrast, my alternate function: as' i s | i >= n = (i, s) | otherwise = as' (i+2) (...) is easily not strict in s, because: as' i undefined = (i, undefined) when `i >= n` is True, and (i, undefined) is not undefined. So it is invalid to perform the same optimization, because it would change the answer on some inputs. We happen to know that all the places the function is used, those two answers would be equivalent (because we just project out the second item), but that's somewhat difficult, non-local reasoning. Note that GHC does do some optimization. The loop in until is (in this case) a loop on a pair, and it figures out that the loop is strict in the pair, and eliminates the construction of the pair (by passing two separate arguments, and returning an unboxed pair from the loop). Optimizing the pieces of the pair would have to be a sort of speculative operation (without doing a global analysis), where 2^k different versions of the loop are generated (where k is the number of non-strict pieces), and the call sites are optimized to call the best of those versions for them. But 2^k isn't a good looking number for this sort of thing. -- Dan

Got it! Thanks!!
On Sun, Mar 27, 2016 at 7:44 PM Dan Doel
On Sun, Mar 27, 2016 at 4:42 PM, George Colpitts
wrote: I still don't understand why
as' i s | i >= n = s | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))
is not lazy in its second arg as it only has to evaluate it if i >= n. This seems exactly analogous to or, ||, which only needs to evaluate its second arg if its first arg is False. || is definitely lazy in its second argument. Can you explain why as' is strict in its second arg, s? Is it due to the following from, https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand ?
The simplest way to see why as' is strict in s is to use the actual definition of strictness, and not think about operational semantics. The definition is that a function f is strict if `f undefined = undefined`. So, what is `as' i undefined`?
If `i >= n` then `as' i undefined = undefined` if not `i >= n` then `as' i undefined = as' (i+2) undefined`
The second case holds because Double arithmetic is strict, so the complex expression will be undefined because s is.
From these cases we can reason that the result will always be undefined. Either the value of i will grow such that the first case succeeds, or if that will never happen (if n is NaN for instance), the loop will continue forever, in which case the result is equivalent to undefined semantically.
Because of this denotational analysis, we are at liberty to make the operation of as' such that it always evaluates s immediately, because it won't change the answer. And that is the optimization that saves time/space.
By contrast, my alternate function:
as' i s | i >= n = (i, s) | otherwise = as' (i+2) (...)
is easily not strict in s, because:
as' i undefined = (i, undefined)
when `i >= n` is True, and (i, undefined) is not undefined. So it is invalid to perform the same optimization, because it would change the answer on some inputs. We happen to know that all the places the function is used, those two answers would be equivalent (because we just project out the second item), but that's somewhat difficult, non-local reasoning.
Note that GHC does do some optimization. The loop in until is (in this case) a loop on a pair, and it figures out that the loop is strict in the pair, and eliminates the construction of the pair (by passing two separate arguments, and returning an unboxed pair from the loop). Optimizing the pieces of the pair would have to be a sort of speculative operation (without doing a global analysis), where 2^k different versions of the loop are generated (where k is the number of non-strict pieces), and the call sites are optimized to call the best of those versions for them. But 2^k isn't a good looking number for this sort of thing.
-- Dan
participants (2)
-
Dan Doel
-
George Colpitts