[Haskell-cafe] Re: Difference lists and ShowS

3 Jan 2008

      Henning Thielemann wrote:
...
I can't see it. If I consider (x++y) but I do not evaluate any element of
(x++y) or only the first element, then this will need constant time. If I
evaluate the first n elements I need n computation time units. How is (.)
on difference lists faster than (++) here?
That's a very good question. Basically, the problem is: how to specify 
the time complexity of an operation under lazy evaluation?

You argue that (++) is constant time in the sense that evaluating (x ++ 
y) to WHNF is O(1) when x and y are already in WHNF. Same for (.). This 
is indeed correct but apparently fails to explain why (.) is any better 
than (++). Help!

Of course, this very paradox shows that just looking at WHNF is not 
enough. The next best description is to pretend that our language is 
strict and to consider full normal form

   x in NF, y in NF --> (x++y) evaluates to NF in O(length x) time

Even when x and y are not in normal form, we know that evaluating the 
expression (x ++ y) takes

   O(x ++ y) ~ O(length x) + O(x) + O(y)

time to evaluate to NF. Here, O(e) is the time needed to bring the 
expression e into NF first. This approach now explains that (++) takes 
quadratic time when used left-associatively

   O((x ++ y) ++ z) ~   O(length x + length y) + O(length x)
                      + O(x) + O(y) + O(z)

instead of the expected

   O((x ++ y) ++ z) ~ O(x) + O(y) + O(z)

or something (only up to constant factors and stuff, but you get the 
idea). Note that considering NFs is still only an approximation since

   O(head (qsort xs)) ~ O(n) + O(xs)  where n = length xs

instead of the expected

   O(head (qsort xs)) ~ O(qsort xs)
                      ~ O(n log n) + O(xs) where n = length xs

thanks to lazy evaluation. Also note that despite considering full 
normal forms, we can express some laziness with this by giving timings 
for an expression in different contexts like

   O(take n ys)
   O(head ys)

instead of only O(ys). Same for parameters with something like

   O(const x) ~ O(1)

instead of the anticipated O(const x) ~ O(x). (For lazy data structures, 
there are better ways to take laziness into account.)
...
With difference lists I write
shows L . (shows T . shows R)
(shows LL . (showsLT . shows LR)) . (shows T . shows R)
((shows LLL . (shows LLT . shows LLR)) . (showsLT . shows LR)) . (shows T . shows R)
I still need to resolve three (.) until I get to the first character of
the result string, but for the subsequent characters I do not need to
resolve those dots. In the end, resolution of all (.) may need some time
but then concatenation is performed entirely right-associative. Seems to
be that this is the trick ...
So far so good, but the problem now is that analyzing (.) with full 
normal forms is doomed since this would mean to evaluate things under 
the lambda which may take less time than doing call-by-need reductions. 
Still, we somehow have

   O(x . y) ~ O(x) + O(y)

which is better than O(x ++ y) but I'm not quite sure how to make this 
exact yet.

In the end, the above O(e)s are just random doodles conjured out of the 
hat, I don't know a formalism for easy reasoning about time in a lazy 
language. Anyone any pointers? Note that the problem is already present 
for difference lists in strict languages.

Regards,
apfelmus