
On Tue, 2007-02-13 at 15:12 -0600, Creighton Hogg wrote:
On 2/13/07, Duncan Coutts
wrote: On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote: > Hi, I am running the following code against a 210 MB file in an attempt to > determine whether I should use alex or whether, since my needs are very > performance oriented, I should write a lexer of my own. I thought that > everything I'd written here was tail-recursive Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything.
This may be silly of me, but I feel like this is an important point: so you're saying that tail recursion, without strictness, doesn't run in constant space?
There are two kinds of space use that you have to consider here. One is the stack space and the other is the space required by whatever it is that your recursive function is doing (in particular if your recursive function constructs a list then you need space for that list).
So for example in the case of, facTail 1 n' = n' facTail n n' = facTail (n-1) (n*n') You'll just be building a bunch of unevaluated thunks until you hit the termination condition?
Actually yes, though with a slight modification we can fix that and make it run in constant space: facTail !1 !n' = n' facTail !n !n' = facTail (n-1) (n*n') however the original example, even if we did something like the above it still has major problems. Yes it is tail recursive and so it's not taking any stack space, it is a true loop, but it's a loop that's allocating a massive list! Let's look at the code again: pass1 :: String -> String -> String pass1 left [] = left pass1 left ('<':right) = pass1 left (stripTagOrComment right) pass1 left (' ':right) = pass1 left right pass1 left (c:right) | Set.member c punct = pass1 (' ':c:' ':left) right | otherwise = pass1 (c:left) right This may well be a perfect tail recursive loop but each iteration it's allocating a cons cell. It doesn't return until it has consumed the entire input and built the entire output. So if you run it on a 2TB file then it's going to pull the whole lot into memory before returning anything. So as I said originally, this is a case where it pays to be lazy. Duncan