Strange memory consumption problems in something that should be tail-recursive

Hi, I am running the following code against a 210 MB file in an attempt to determine whether I should use alex or whether, since my needs are very performance oriented, I should write a lexer of my own. I thought that everything I'd written here was tail-recursive, but after compiling this with GHC 2.4.6, and running it, I eat up 2GB of RAM in less than a second. So far, I have tried token and character oriented Parsec parsers and alex and alex is winning by a factor of 2. I would like to be able to tokenize the entirety of a 1TB collection in less than 36 hours on my current machine, which is where alex has gotten me so far. Thanks in advance! -- Jeff --- module Main where import qualified FileReader import qualified Data.Set as Set punct = foldl (flip Set.insert) Set.empty "<,>.?/:;\"'{[}]|\\_-+=) (*&^%$##@!~`" stripTagOrComment [] = [] stripTagOrComment ('>':rest) = rest stripTagOrCOmment (c:rest) = stripTagOrComment rest pass1 :: String -> String -> String pass1 left [] = left pass1 left ('<':right) = pass1 left (stripTagOrComment right) pass1 left (' ':right) = pass1 left right pass1 left (c:right) | Set.member c punct = pass1 (' ':c:' ':left) right | otherwise = pass1 (c:left) right pass2 :: [String] -> String -> Char -> String -> [String] pass2 left word ' ' [] = word:left pass2 left word c [] = (c:word):left pass2 left word ' ' (' ':right) = pass2 left word ' ' right pass2 left word ' ' (c:right) = pass2 (word:left) "" c right pass2 left word l (c:right) = pass2 left (l:word) c right tokenize = (pass2 [] "" ' ') . (pass1 []) main = do file <- do FileReader.trecReadFile "trecfile" print (tokenize (head (tail file))) -- print (length (map (runParser tokenizeDoc [] "") file))

On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
Hi, I am running the following code against a 210 MB file in an attempt to determine whether I should use alex or whether, since my needs are very performance oriented, I should write a lexer of my own. I thought that everything I'd written here was tail-recursive
Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything. If performance is really important to you then you may also want to investigate lexing from a lazy ByteString. Alex can now do that (darcs version) or you can do it by hand as you're trying now. Duncan

On Tuesday 13 February 2007 15:59, Duncan Coutts wrote:
On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
Hi, I am running the following code against a 210 MB file in an attempt to determine whether I should use alex or whether, since my needs are very performance oriented, I should write a lexer of my own. I thought that everything I'd written here was tail-recursive
Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything.
If performance is really important to you then you may also want to investigate lexing from a lazy ByteString. Alex can now do that (darcs version) or you can do it by hand as you're trying now.
Duncan
Argh, bitten by the scheme bug! Right -- NO tail recursion... So that leaves me with some rather non-intuitive strategies for achieving execution time efficiency. Anyone care to point me in the direction of a document on efficiency in Haskell?

On 2/13/07, Jefferson Heard
Argh, bitten by the scheme bug! Right -- NO tail recursion... So that leaves me with some rather non-intuitive strategies for achieving execution time efficiency. Anyone care to point me in the direction of a document on efficiency in Haskell?
There really should be one! (Although there may be something on the wiki already.) Profiling can help, though. Cheers, Kirsten -- Kirsten Chevalier* chevalier@alum.wellesley.edu *Often in error, never in doubt "Are you aware that rushing toward a goal is a sublimated death wish? It's no coincidence we call them 'deadlines'." -- Tom Robbins

On Feb 13, 2007, at 16:07 , Kirsten Chevalier wrote:
On 2/13/07, Jefferson Heard
wrote: Argh, bitten by the scheme bug! Right -- NO tail recursion... So that leaves me with some rather non-intuitive strategies for achieving execution time efficiency. Anyone care to point me in the direction of a document on efficiency in Haskell?
There really should be one! (Although there may be something on the wiki already.) Profiling can help, though.
http://haskell.org/haskellwiki/Category:Performance ? -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Jefferson Heard wrote:
Argh, bitten by the scheme bug! Right -- NO tail recursion... So that leaves me with some rather non-intuitive strategies for achieving execution time efficiency. Anyone care to point me in the direction of a document on efficiency in Haskell?
I found this page to be helpful: http://haskell.org/haskellwiki/Stack_overflow

Jefferson Heard wrote:
Argh, bitten by the scheme bug! Right -- NO tail recursion... So that leaves me with some rather non-intuitive strategies for achieving execution time efficiency. Anyone care to point me in the direction of a document on efficiency in Haskell?
Besides, proper tail recursion in Haskell needs strictness annotations, but the best way is to forget the two words "tail recursive" altogether :) It always helps to do a rough calculation of how much time you have to expect it to run. Processing 1TB with a 1GHz processor and 16=2^4 machine instruction in the inner loop (must be quite short, the loop) takes 2^40 / (2^30 / 16) = 2^14 seconds ~ 4.5 hours Of course, these 4.5 hours are quite sensitive to the 2^4 factor and might well be 3 or 9 hours. Assuming that you ran alex on a String, the reported 36 hours are entirely reasonable, in the sense of alex not being overly slow. Regards, apfelmus

Didn't think it was overly slow, just that I could do better :-). On Tuesday 13 February 2007 16:30, apfelmus@quantentunnel.de wrote:
Jefferson Heard wrote:
Argh, bitten by the scheme bug! Right -- NO tail recursion... So that leaves me with some rather non-intuitive strategies for achieving execution time efficiency. Anyone care to point me in the direction of a document on efficiency in Haskell?
Besides, proper tail recursion in Haskell needs strictness annotations, but the best way is to forget the two words "tail recursive" altogether :)
It always helps to do a rough calculation of how much time you have to expect it to run. Processing 1TB with a 1GHz processor and 16=2^4 machine instruction in the inner loop (must be quite short, the loop) takes
2^40 / (2^30 / 16) = 2^14 seconds ~ 4.5 hours
Of course, these 4.5 hours are quite sensitive to the 2^4 factor and might well be 3 or 9 hours. Assuming that you ran alex on a String, the reported 36 hours are entirely reasonable, in the sense of alex not being overly slow.
Regards, apfelmus
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 2/13/07, Duncan Coutts
On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
Hi, I am running the following code against a 210 MB file in an attempt to determine whether I should use alex or whether, since my needs are very performance oriented, I should write a lexer of my own. I thought that everything I'd written here was tail-recursive
Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything.
This may be silly of me, but I feel like this is an important point: so you're saying that tail recursion, without strictness, doesn't run in constant space? So for example in the case of, facTail 1 n' = n' facTail n n' = facTail (n-1) (n*n') You'll just be building a bunch of unevaluated thunks until you hit the termination condition?

Creighton Hogg wrote:
On 2/13/07, *Duncan Coutts*
mailto:duncan.coutts@worc.ox.ac.uk> wrote: On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote: > Hi, I am running the following code against a 210 MB file in an attempt to > determine whether I should use alex or whether, since my needs are very > performance oriented, I should write a lexer of my own. I thought that > everything I'd written here was tail-recursive
Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything.
This may be silly of me, but I feel like this is an important point: so you're saying that tail recursion, without strictness, doesn't run in constant space?
It is an important point, and a classic space bug (see foldl in the Prelude). It it not the fault of tail recursion per se, in fact tail recursion is often important in Haskell too.
So for example in the case of, facTail 1 n' = n' facTail n n' = facTail (n-1) (n*n')
The problem with this example is that it will build up an expression of the form: (n1 * n2 * n3 .....) in the second argument. It's size will be proportional to the number of recursive calls made (n).
You'll just be building a bunch of unevaluated thunks until you hit the termination condition?
To fix it you will want the function to evaluate its second argument eagerly: facTail n n' = facTail (n-1) $! (n*n') Cheers, Bernie.

On 2/13/07, Bernie Pope
Creighton Hogg wrote:
This may be silly of me, but I feel like this is an important point: so you're saying that tail recursion, without strictness, doesn't run in constant space?
It is an important point, and a classic space bug (see foldl in the Prelude).
It it not the fault of tail recursion per se, in fact tail recursion is often important in Haskell too.
So for example in the case of, facTail 1 n' = n' facTail n n' = facTail (n-1) (n*n')
The problem with this example is that it will build up an expression of the form:
(n1 * n2 * n3 .....)
in the second argument. It's size will be proportional to the number of recursive calls made (n).
You'll just be building a bunch of unevaluated thunks until you hit the termination condition?
To fix it you will want the function to evaluate its second argument eagerly:
facTail n n' = facTail (n-1) $! (n*n')
Awesome. For a long time now I've been interested in Haskell, and studied it from the math side, but haven't actually really written anything. This mailing list, the wiki, and #haskell are proving to be a great resource.

Ha! You're right! I didn't think about the laziness aspect of it. Anyway, the non tail-recursive version fixed the problem. Thanks! On Tuesday 13 February 2007 16:32, Bernie Pope wrote:
Creighton Hogg wrote:
On 2/13/07, *Duncan Coutts*
mailto:duncan.coutts@worc.ox.ac.uk> wrote: On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote: > Hi, I am running the following code against a 210 MB file in an
attempt to
> determine whether I should use alex or whether, since my needs
are very
> performance oriented, I should write a lexer of my own. I
thought that
> everything I'd written here was tail-recursive
Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything.
This may be silly of me, but I feel like this is an important point: so you're saying that tail recursion, without strictness, doesn't run in constant space?
It is an important point, and a classic space bug (see foldl in the Prelude).
It it not the fault of tail recursion per se, in fact tail recursion is often important in Haskell too.
So for example in the case of, facTail 1 n' = n' facTail n n' = facTail (n-1) (n*n')
The problem with this example is that it will build up an expression of the form:
(n1 * n2 * n3 .....)
in the second argument. It's size will be proportional to the number of recursive calls made (n).
You'll just be building a bunch of unevaluated thunks until you hit the termination condition?
To fix it you will want the function to evaluate its second argument eagerly:
facTail n n' = facTail (n-1) $! (n*n') Cheers, Bernie.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tuesday 13 February 2007 22:32, Bernie Pope wrote:
Creighton Hogg wrote: [...]
So for example in the case of, facTail 1 n' = n' facTail n n' = facTail (n-1) (n*n')
The problem with this example is that it will build up an expression of the form:
(n1 * n2 * n3 .....) [...]
This is not true if one takes strictness analysis into account: facTail is strict in both arguments, and any decent strictness analyser will detect this, e.g. GHC with -O. Strict arguments can be evaluated before the function call, so in the example above no stack space will be consumed and the above will basically be a good old loop. For the brave: Use "ghc -v4 -O" on the example above to see what really happens. GHC is even clever enough to factor out (un-)boxing (at least for Int and friends), so there is a rather tight loop with unboxed Ints. Cheers, S.

On Tue, 2007-02-13 at 15:12 -0600, Creighton Hogg wrote:
On 2/13/07, Duncan Coutts
wrote: On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote: > Hi, I am running the following code against a 210 MB file in an attempt to > determine whether I should use alex or whether, since my needs are very > performance oriented, I should write a lexer of my own. I thought that > everything I'd written here was tail-recursive Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything.
This may be silly of me, but I feel like this is an important point: so you're saying that tail recursion, without strictness, doesn't run in constant space?
There are two kinds of space use that you have to consider here. One is the stack space and the other is the space required by whatever it is that your recursive function is doing (in particular if your recursive function constructs a list then you need space for that list).
So for example in the case of, facTail 1 n' = n' facTail n n' = facTail (n-1) (n*n') You'll just be building a bunch of unevaluated thunks until you hit the termination condition?
Actually yes, though with a slight modification we can fix that and make it run in constant space: facTail !1 !n' = n' facTail !n !n' = facTail (n-1) (n*n') however the original example, even if we did something like the above it still has major problems. Yes it is tail recursive and so it's not taking any stack space, it is a true loop, but it's a loop that's allocating a massive list! Let's look at the code again: pass1 :: String -> String -> String pass1 left [] = left pass1 left ('<':right) = pass1 left (stripTagOrComment right) pass1 left (' ':right) = pass1 left right pass1 left (c:right) | Set.member c punct = pass1 (' ':c:' ':left) right | otherwise = pass1 (c:left) right This may well be a perfect tail recursive loop but each iteration it's allocating a cons cell. It doesn't return until it has consumed the entire input and built the entire output. So if you run it on a 2TB file then it's going to pull the whole lot into memory before returning anything. So as I said originally, this is a case where it pays to be lazy. Duncan

Duncan Coutts wrote:
On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
Hi, I am running the following code against a 210 MB file in an attempt to determine whether I should use alex or whether, since my needs are very performance oriented, I should write a lexer of my own. I thought that everything I'd written here was tail-recursive
Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything.
Duncan is right, and I will just elaborate a little bit. Consider the pass1 function: pass1 :: String -> String -> String pass1 left [] = left pass1 left ('<':right) = pass1 left (stripTagOrComment right) pass1 left (' ':right) = pass1 left right pass1 left (c:right) | Set.member c punct = pass1 (' ':c:' ':left) right | otherwise = pass1 (c:left) right It accumulates its result in the "left" parameter. So it chomps down the "right" string building up a bigger and bigger solution until it reaches the base case, and hands the solution over to the calling function. The calling function gets nothing back from pass1 until pass1 has processed the whole input. And that accumulated solution in "left" could grow quite big. A much better approach would be: pass1 :: String -> String pass1 [] = [] pass1 ('<':right) = pass1 (stripTagOrComment right) pass1 (' ':right) = pass1 right pass1 (c:right) | Set.member c punct = ' ':c:' ': pass1 right | otherwise = c : pass1 right This way, pass1 will be producing output as early as possible, which can be consumed earlier by the calling function. Lazy evaluation gives you a kind of co-routining between producers and consumers, but you have to write "good" producers and "good" consumers. You should also write the pass2 in this style as well. Your memory consumption should drop to something very small. Cheers, Bernie.

duncan.coutts:
On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
Hi, I am running the following code against a 210 MB file in an attempt to determine whether I should use alex or whether, since my needs are very performance oriented, I should write a lexer of my own. I thought that everything I'd written here was tail-recursive
Isn't that exactly the problem - that it's tail recursive? You do not want it to be tail recursive since then it must consume the whole input before producing any output. You want it to be as lazy as possible so that it can start producing tokens as soon as possible without having to consume everything.
If performance is really important to you then you may also want to investigate lexing from a lazy ByteString. Alex can now do that (darcs version) or you can do it by hand as you're trying now.
I'd reenforce this point: the only chance for C like performance for this kind of problem is to use lazy bytestrings, or a combined strict head, lazy tail approach. Then you can reasonably expect to compete with C. Examples on the shootout. -- Don
participants (10)
-
apfelmus@quantentunnel.de
-
Bernie Pope
-
Brandon S. Allbery KF8NH
-
Creighton Hogg
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
Jefferson Heard
-
Kirsten Chevalier
-
Seth Gordon
-
Sven Panne