Dan, I really appreciate that you responded so quickly and  took the time to explain this. Unfortunately I still don't understand.

I did think of laziness as a potential source of bad performance but to me both internal functions were lazy in the second argument. Since I couldn't explain why both weren't equally slow due to laziness, I dismissed laziness as a possible source of the problem instead of questioning my understanding of laziness. 

I still don't understand why 

as'
​ ​
i s
     | i >= n    = s
     | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))

is not lazy in its second arg as it only has to evaluate it if i >= n. This seems exactly analogous to or, ||, which only needs to evaluate its second arg if its first arg is False. || is definitely lazy in its second argument. Can you explain why as' is strict in its second arg, s? Is it due to the following from, https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand ?

For functions, it is assumed, that the result of the function is used strictly.

I guess  I should have checked my intuition against the compiler strictness analysis. I modified sum.hs to only have a definition of sumr, commenting out sumh and tried the various tools.

The user's guide, https://downloads.haskell.org/~ghc/8.0.1-rc2/docs/html/users_guide/sooner.html#faster-producing-a-program-that-runs-quicker, recommends:

Look for your function in the interface file, then for the third field in the pragma; it should say Strictness: ⟨string⟩. The ⟨string⟩ gives the strictness of the function’s arguments: see the GHC Commentary for a description of the strictness notation.

It doesn't mention that the interface file is binary and that you need to use ghc --show-iface, e.g.

ghc --show-iface sum.hi > sum.hir

Doing this seems to give the wrong answer, i.e. the second arg is lazy.

                     $was' :: Double# -> Double# -> Double#
          {- Arity: 2, Strictness: <S,U><L,U>, Inline: [0] -}


Is this a bug?

ghc -ddump-stranal -O sum.hs gives the right answer:

as'_s25G [Occ=LoopBreaker] :: Double -> Double -> Double
      [LclId,
       Arity=2,
       CallArity=2,
       Str=DmdType <S(S),1*U(U)><S,1*U(U)>m {avM-><S(S),U(U)>},
       Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
       WorkFree=True, Expandable=True, Guidance=IF_ARGS [20 20] 108 0}]

Unfortunately the simplest way to find out doesn't give information on internal functions. I will file an ER for this:

 ghc -ddump-strsigs -O sum.hs
[1 of 1] Compiling Main             ( sum.hs, sum.o )
==================== Strictness signatures ====================
:Main.main:
Main.$trModule: m
Main.main: x
Main.sumr: <S(S),U(U)>m

I really like your definition of until' as the solution to this. To me it seems analogous to foldl', in that it provides a way to avoid an optimization problem that may be subtle to many. I guess it is not general enough to be added to a library. The following world be general enough but it would probably be overkill:

until' :: NFData a => (a -> Bool) -> (a -> a) -> a -> a
until' p f = go
  where
    go x | p $!! x   = x
         | otherwise = go (f x)


Thanks again for all your help
George


On Sat, Mar 26, 2016 at 5:40 PM Dan Doel <dan.doel@gmail.com> wrote:
By the way, in case this helps your mental model, if you modify sumr to be:

    sumr n = snd $ as' 1 0
      where
      as' i s
        | i >= n = (i, s)
        | otherwise = ...

Then it has the same problem as sumh. Your original as' for sumr is
strict in s, but this modified one isn't.

This shows another way to fix sumh, too. Create a version of until
that separates out the part of the state that is only for testing.
Then the until loop will be strict in the result part of the state,
and the desired optimizations will happen (in this case):

    until' p step = go
     where
     go t r
       | p t = r
       | otherwise = uncurry go $ step (t, r)

-- Dan

On Sat, Mar 26, 2016 at 1:50 PM, George Colpitts
<george.colpitts@gmail.com> wrote:
> The following higher order function, sumh, seems to be 3 to 14 times slower
> than the equivalent recursive function, sumr:
>
> sumh :: Double -> Double
> sumh n =
>     snd $ until ((>= n) . fst) as' (1, 0)
>     where
>       as' (i,s) =
>           (i + 2, s + (-1) / i + 1 / (i + 1))
>
> sumr :: Double -> Double
> sumr n =
>     as' 1 0
>     where
>       as' i  s
>           | i >= n    = s
>           | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))
>
> This is true in 7.10.3 as well as 8.0.1 so this is not a regression. From
> the size usage my guess is that this is due to the allocation of tuples in
> sumh. Maybe there is a straightforward way to optimize sumh but I couldn't
> find it. Adding a Strict pragma didn't help nor did use of
> -funbox-strict-fields -flate-dmd-anal. Have I missed something or should I
> file a bug?
>
> Timings from 8.0.1 rc2:
>
> ghc --version
> The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204
> bash-3.2$ ghc -O2 -dynamic sum.hs
> ghc -O2 -dynamic sum.hs
> [1 of 1] Compiling Main             ( sum.hs, sum.o )
> Linking sum ...
> bash-3.2$ ghci
> Prelude> :load sum
> Ok, modules loaded: Main.
> (0.05 secs,)
> Prelude Main> sumh (10^6)
> -0.6931466805602525
> it :: Double
> (0.14 secs, 40,708,016 bytes)
> Prelude Main> sumr (10^6)
> -0.6931466805602525
> it :: Double
> (0.01 secs, 92,000 bytes)
>
> Thanks
> George
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>