
A conversation on #haskell just showed that it's quite hard to explain (at least it is for me) to people attached to tail recursion why that is a red herring in Haskell. I had a poke around the wiki and couldn't see a page which explains it clearly. In fact http://www.haskell.org/haskellwiki/Performance/Accumulating_parameter is actuall slightly misleading since it suggests that the performance gain is from the tail recursion whereas really it's from the change from a function which accumulates an enormous thunk to an function which can do the work as it goes along. If someone has the energy and the clarity to explain this clearly and make a wiki page about it, that would be great. If someone finds an existing wiki page I'm too unperceptive to notice, even better! Jules

This page [1] has useful info on it. Having done a lot of lisp/ocaml
before coming to Haskell I was also very attached to tail recursion;
it took a long time to realise this was wrong. This topic definitely
needs more prominence on the wiki.
- Joe
[1] http://www.haskell.org/haskellwiki/Stack_overflow
On 18/05/07, Jules Bean
A conversation on #haskell just showed that it's quite hard to explain (at least it is for me) to people attached to tail recursion why that is a red herring in Haskell.
I had a poke around the wiki and couldn't see a page which explains it clearly. In fact http://www.haskell.org/haskellwiki/Performance/Accumulating_parameter is actuall slightly misleading since it suggests that the performance gain is from the tail recursion whereas really it's from the change from a function which accumulates an enormous thunk to an function which can do the work as it goes along.
If someone has the energy and the clarity to explain this clearly and make a wiki page about it, that would be great. If someone finds an existing wiki page I'm too unperceptive to notice, even better!
Jules _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, 18 May 2007 01:52:11 -0700, Jules Bean
A conversation on #haskell just showed that it's quite hard to explain (at least it is for me) to people attached to tail recursion why that is a red herring in Haskell.
If someone has the energy and the clarity to explain this clearly and make a wiki page about it, that would be great. If someone finds an existing wiki page I'm too unperceptive to notice, even better!
Here's an explanation that I put together a while ago for a friend (and forgive my pseudo-coding): ------------------------------------------------- I guess the thing with Haskell is that the complication is in the details and behavior of your program. It?s very easy to shoot yourself in the foot without even knowing. The basics of parsers are very easy to understand. For instance: hello = char ?h? >> char ?e? >> char ?l? >> char ?l? >> char ?o? The ?do? notation is just syntactic sugar for that, which sometimes makes those more readable: hello = do char ?h? char ?e? char ?l? char ?l? char ?o? Is a very simple parser that is easy to understand. I mean? any monad can be seen as just a way to enforce order of evaluation (so that in this case the letters are recognized in the proper order) and to hide some of the parameters to the functions (so that you don?t have to explicitly pass the input string in those parsers). That?s really all there is to it. Then there?s the concept of combinators: functions that take parsers as arguments, and generate a new parser. ?>>? is part of the monad definition, but it is also a combinator in its own right (take two parsers and generate a new parser which is the sequence of them). Also the alternative (call it ?<|>?) is a combinator: many1hellos = hello >> (many1hellos <|> return ()) ?return a? is also part of the definition of any monad. It is an operation that always succeeds and just returns the value, in this case the Haskell equivalent to ?void?. So if ?many1hellos? fails because the next ?hello? couldn?t be matched, then it calls it good anyway. That?s the backtracking you find in all these parsers. The ?<|>? combinator knows about the guts of the monad (about the internal state and the guts), of course, so it can do the very interesting things it does. It?s a primitive combinator. Something kinda like this (hope you?re already comfortable with the ?do? notation): a <|> b = do oldState <- getCurrentParserState resultFromA <- try a if failed resultFromA then do -- Try the other one setCurrentparserState oldState b else do -- oldState gets forgotten ? it will be garbage-collected return (extractResultValue resultFromA) Once you have enough of those primitives, you can build your own combinators, which work in exactly the same way. That?s where things start to go south, though, and the foot-shooting begins. Take for instance: many1 p = p >> (many1 p <|> return ()) Combinator that takes any parser (just like for instance ?hello?) and parses one or more of them (it exists in the standard library). That works. So far so good. But then maybe we want one that matches zero or more: many0 p = (p <|> return ()) >> (many0 p <|> return ()) That?s a silly example that is totally wrong, obviously (let me know if you don?t see it). This parser will never finish parsing. It?s an infinite loop. All grammar languages have rough corners like this one, of course, but with Haskell they can be very hard to track down. many0 p = (p >> (many0 p <|> return ())) <|> return () This one is also an infinite loop, although it is harder to figure it out. many0 p = (p >> (many1 p <|> return ())) <|> return () many0 p = many1 p <|> return () Those two are both identical and correct, the last one is just more concise. But their behavior is poor. I mean, if you try to parse a 1 GB file full of ?X?, using ?many0 (char ?X?)? the ?many1 p? part will take a long time to parse, and in the meantime the ?oldState? variable that I showed you above in the definition of ?<|>?, which is required in some form in order to do the backtracking, is fully alive (cannot be garbage-collected) even though after the first ?X? is parsed _we_ know it can never fail. That is a bad thing, but much worse is the behavior of ?many1?. In this case, the recursion is not good for tail-recursion, so you?ll end up running out of stack. Basically, you end up with a nested sequence of ?<|>? calls. That sort of problem is endemic to backtracking parsers, and all monadic parsers I know are backtracking parsers. The solution in this case is to never put anything expensive on the left side of a ?<|>? combinator. We can do that like so: many0 p = do result <- try p if failed result then return () else many0 p This way, there is no backtracking, and many1 can be efficiently defined as so: many1 p = p >> many0 p That is all somewhat similar to the issues you?d find in a recursive-descent parser written in any imperative language. But then again ? monads are very much like somewhat-restricted imperative programs. The point where it all gets really tricky is that you can find similar sorts of problems in non-monadic code, and those are really hard to find. For instance, take the following function from the standard library (I hope you?re somewhat acquainted with Haskell lists): replicate 0 v = [] replicate n v = v : replicate (n-1) v That?s a neat little function which builds a list by repeating the ?v? element ?n? times. Very simple. But it is not tail-recusive! The imperative way of thinking about it is that, after calling ?replicate (n-1) v?, it gets the result and appends it to ?v?. Clearly, that prevents the tail-recursion from happening. A fully tail-recursive version is this: replicate n v = replicateHelper n v [] replicateHelper 0 v result = result replicateHelper n v result = replicateHelper (n-1) (v : result) To the imperative eye, that looks much better. The problem here is that the tail-recursive version is less efficient in Haskell. This is because of the laziness inherent in the language. In the first version, the expression ?replicate (n-1) v? is a very compact thing. It?ll be evaluated some day, but until then it occupies very little space. But, in the tail-recursive version, the recursion _must_ happen all the way to the end before you can do anything with the list. As soon as you try to do anything with the list (like compare it with ?[]? to see if it?s empty), it will evaluate all the calls to ?replicateHelper?. It won?t evaluate all the ?:? operations, for instance, but still it will create an expression tree such as ?v : (v : (v : (v : ?)))? which is arbitrarily large. It has no choice, as the outermost ?v : ?? node of the expression tree is generated by the innermost call to ?replicateHelper?. So in the first case, the comparison ?replicate 1000000 ?X? == []? returns ?False? immediately. But in the second case it probably will blow due to lack of memory. The proper ?Haskell? way to read that function (or any other function) is? ?how quickly can you get to the outermost node of the actual expression generated by this function?? That will tell you how efficient it is, much better than anything else. ------------------------------------------------- The morale is... tail recursion is a strict code optimization only. Lazy code will do best to stay as far away from it as possible. Hope there aren't too many typos, and also hope that helps someone. JCAB

Jules Bean wrote:
A conversation on #haskell just showed that it's quite hard to explain (at least it is for me) to people attached to tail recursion why that is a red herring in Haskell.
What is tail recursion? Suppose (1) x = f y x A participant of the said conversation classified this to be not tail recursion. Suppose further (0) f y x = x Is (1) a tail recursion now? Eager evaluation says, to evaluate x, call x and y first, then there is still f to call, so x is hardly the tail call. Lazy evaluation says, to evaluate x, call f first, then x is the last call. I have problem regarding this not tail recursion, but I seem to be alone. Define (2) cs = 'c' : cs Is (2) a tail recursion? Use it with (3) putStr cs Lazy evaluation together with putStr's forcing order says, to evaluate cs, get to the cons cell first, then evaluate 'c', then cs is the last call. Or use it with (4) putStr [cs !! 5, cs !! 3] Then cs is not the last call: for some n, the nth recursive call to cs happens before evaluating the nth 'c'. I no longer understand what other people mean by tail recursion. Am I attached to tail recursion? I am certainly attached to some recursion, which I consider to be tail recursion, but other people say it is not, and under some exotic usage it is indeed not. Perhaps I am attached to what they consider init recursion or middle recursion. Is tail recursion a red herring? "Tail recursion considered a red herring"? "(Tail recursion considered a red herring) considered a red herring"?

On 18/05/07, Albert Y. C. Lai
Lazy evaluation says, to evaluate x, call f first, then x is the last call.
x may be the last to be evaluated, but you still have to pass through f's call context 'on the way out'. For example, suppose in the expression 'f x y' you have f = (+), x = 2 and y = 3, then lazy evaluation looks something like the following: f x y (+) x y (+) 2 3 5 Notice that you still have to evaluate x and y before you can perform the addition. In other words, although you may perform the manipulations that the definition of f describes on thunks, and only later force those thunks, you still have the call context of the f to pass through: you're not evaluating those thunks for no particular reason, you're evaluating them so that f may return. Your argument also seems to be using the reflexivity of the = operation, which doesn't really hold at the binding level. To exemplify, set f = (:) and y = 1, so that you have: (0) x = (:) 1 x (1) (:) 1 x = x The first sets up a binding for 'x', that when evaluated will yield the infinite list [1, 1, 1, ...]. The second sets up a binding for the function '(:)', which when given two arguments, the first of which is 1, will evaluate to its second. I don't really see these as being the same expression.
Define
(2) cs = 'c' : cs
Is (2) a tail recursion?
No, for the same reason discussed above: you still have to pass through the call context of (:) once cs is evaluated. What I'm trying to do is disassociate the concept of tail recursion from evaluation order. To say something is tail recursive is to say that a call to itself appears at the top of the call stack: to say it's the "last thing you do" is helpful intuitively but starts to fall apart once you move away from the traditional strict evaluation order semantics. Let's have one more example to explain what I mean by 'call stack'. The classic tail recursive function is foldl, contrasting with the non-tail recursive function foldr. Equations from both these functions are shown below, with their right-hand sides transformed into a tree (ASCII art: don't use a proportional font): foldl f z (x:xs) = foldl f (f z x) xs foldl-+--+-----+ | | | f f--+ xs | | z x foldr f z (x:xs) = f x (foldr f z xs) f-+--+ | | x foldr-+--+--+ | | | f z xs Tail recursion, then, expresses the idea that a function appears at the top of its call stack, or at the top of the tree of its right-hand side. It's got nothing to do with evaluation order. -- -David House, dmhouse@gmail.com

"David House"
On 18/05/07, Albert Y. C. Lai
wrote: Lazy evaluation says, to evaluate x, call f first, then x is the last call.
x may be the last to be evaluated, but you still have to pass through f's call context 'on the way out'.
Not exactly, or, only if it has some...
Let's have one more example to explain what I mean by 'call stack'. The classic tail recursive function is foldl, contrasting with the non-tail recursive function foldr. Equations from both these functions are shown below, with their right-hand sides transformed into a tree (ASCII art: don't use a proportional font):
foldl f z (x:xs) = foldl f (f z x) xs
foldl-+--+-----+ | | | f f--+ xs | | z x
foldr f z (x:xs) = f x (foldr f z xs)
f-+--+ | | x foldr-+--+--+ | | | f z xs
Tail recursion, then, expresses the idea that a function appears at the top of its call stack, or at the top of the tree of its right-hand side. It's got nothing to do with evaluation order.
Well, I'd rather have it a different way. The simplest tail recursion is the head function in the expression (the recursion is a/the tail call) -- so in
foldl f z (x:xs) = foldl f (f z x) xs
it's easy to see that foldl is a tail call and therefore tail recursion, because it's the leftmost outermost function. Whether or not (f z x) is evaluated strictly, the body of foldl doesn't have to do anything after the call to foldl. In strict evaluation you would evaluate (f z x) before calling foldl, and in lazy evaluation you just make a representation of (f z x), but in either case you pass this as an argument to the tail function and just go. Some of the confusion here is that the left hand side of the definition actually does some work, so the notation handily obscures the fact that foldl isn't such a straightforward tail call. It's actually something like:
foldl f z l = case l of (x:xs) -> foldl f (f z x) xs [] -> z
which reveals that foldl in the RHS isn't really the leftmost outermost function, but case is -- the tail call is to case. It happens that case is a kind of function that makes a choice -- it returns one of its arguments but doesn't do anything more to it. In a strict language only some predefined operators like case and if have this property, but lazy languages allow the definition of new ones, which is why we need more care when thinking about tail calls. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Sat, May 19, 2007 at 04:10:29PM +0100, Jon Fairbairn wrote: [...]
foldl f z l = case l of (x:xs) -> foldl f (f z x) xs [] -> z
which reveals that foldl in the RHS isn't really the leftmost outermost function, but case is -- the tail call is to case. It happens that case is a kind of function that makes a choice -- it returns one of its arguments but doesn't do anything more to it. In a strict language only some predefined operators like case and if have this property, but lazy languages allow the definition of new ones, which is why we need more care when thinking about tail calls.
By definition, tail recursive function is recursive function in which the value returned from a recursive call is immediately returned without modification as value of the top-level invocation, which is true for foldl defined as above.
-- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Ilya Tsindlekht
On Sat, May 19, 2007 at 04:10:29PM +0100, Jon Fairbairn wrote: [...]
foldl f z l = case l of (x:xs) -> foldl f (f z x) xs [] -> z
which reveals that foldl in the RHS isn't really the leftmost outermost function, but case is -- the tail call is to case. It happens that case is a kind of function that makes a choice -- it returns one of its arguments but doesn't do anything more to it. In a strict language only some predefined operators like case and if have this property, but lazy languages allow the definition of new ones, which is why we need more care when thinking about tail calls.
By definition, tail recursive function is recursive function in which the value returned from a recursive call is immediately returned without modification as value of the top-level invocation, which is true for foldl defined as above.
Sure. Did I say otherwise? -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Sat, May 19, 2007 at 09:16:46PM +0100, Jon Fairbairn wrote:
Ilya Tsindlekht
writes: By definition, tail recursive function is recursive function in which the value returned from a recursive call is immediately returned without modification as value of the top-level invocation, which is true for foldl defined as above.
Sure. Did I say otherwise? Sorry, I thought you were saying foldl is not tail-recursive. I must have not read your message carefully enough, mea culpa.
-- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Ilya Tsindlekht
On Sat, May 19, 2007 at 09:16:46PM +0100, Jon Fairbairn wrote:
Ilya Tsindlekht
writes: By definition, tail recursive function is recursive function in which the value returned from a recursive call is immediately returned without modification as value of the top-level invocation, which is true for foldl defined as above.
Sure. Did I say otherwise? Sorry, I thought you were saying foldl is not tail-recursive. I must have not read your message carefully enough, mea culpa.
No worries. I was, in fact, saying that in a lazy language more things are tail recursive than you might expect, owing to the possibility of defining choice functions other than if and case. One might even argue that the question of whether something is tail recursive (in effect) depends on the context:
down 0 = [] down n = n:down (n-1)
is clearly not tail recursive: the tail call is to (:), but if the context only ever applies either head or tail, (:) is effectively a choice...
silly [] = [] silly (x:xs) = silly xs
clearly is, now silly (down n) only ever applies tail to the conses, so no state builds up, so the combined effect of the two functions is as if they were tail recursive. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk
participants (7)
-
Albert Y. C. Lai
-
David House
-
Ilya Tsindlekht
-
Joe Thornber
-
Jon Fairbairn
-
Juan Carlos Arevalo Baeza
-
Jules Bean