Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

[redirecting to ghc-users since this is a GHC question]
On 2/13/07, Jefferson Heard
Hi, I am running the following code against a 210 MB file in an attempt to determine whether I should use alex or whether, since my needs are very performance oriented, I should write a lexer of my own. I thought that everything I'd written here was tail-recursive, but after compiling this with GHC 2.4.6, and running it, I eat up 2GB of RAM in less than a second. So far, I have tried token and character oriented Parsec parsers and alex and alex is winning by a factor of 2. I would like to be able to tokenize the entirety of a 1TB collection in less than 36 hours on my current machine, which is where alex has gotten me so far. Thanks in advance!
-- Jeff
---
module Main where
import qualified FileReader import qualified Data.Set as Set
punct = foldl (flip Set.insert) Set.empty "<,>.?/:;\"'{[}]|\\_-+=) (*&^%$##@!~`"
stripTagOrComment [] = [] stripTagOrComment ('>':rest) = rest stripTagOrCOmment (c:rest) = stripTagOrComment rest
pass1 :: String -> String -> String pass1 left [] = left pass1 left ('<':right) = pass1 left (stripTagOrComment right) pass1 left (' ':right) = pass1 left right pass1 left (c:right) | Set.member c punct = pass1 (' ':c:' ':left) right | otherwise = pass1 (c:left) right
pass2 :: [String] -> String -> Char -> String -> [String] pass2 left word ' ' [] = word:left pass2 left word c [] = (c:word):left pass2 left word ' ' (' ':right) = pass2 left word ' ' right pass2 left word ' ' (c:right) = pass2 (word:left) "" c right pass2 left word l (c:right) = pass2 left (l:word) c right
tokenize = (pass2 [] "" ' ') . (pass1 [])
main = do file <- do FileReader.trecReadFile "trecfile" print (tokenize (head (tail file)))
-- print (length (map (runParser tokenizeDoc [] "") file))
Have you tried profiling? (see section 5 of the GHC manual.) What's your GHC command line? Tail-recursion in Haskell doesn't always work the way you'd expect, but without profiling it's pretty hard to tell what the problem is. Cheers, Kirsten -- Kirsten Chevalier* chevalier@alum.wellesley.edu *Often in error, never in doubt "Relax. I'm weird, not violent."--Brad Boesen, _Disturbed_
participants (1)
-
Kirsten Chevalier