Greg Buchholz wrote:
The algorithm isn't correct (it counts spaces instead of words), but anyone have advice for improving its performance?
You probably want some strictness annotations in there. . . When I tried the same thing, I came up with something like:
import Char;
cclass c | isSpace c = (c == '\n', False) | otherwise = (False , True)
data Cdata = Cdata !Bool !Int !Int !Int deriving Show
combine (Cdata last l w c) (nl, iw) = Cdata iw l' w' (c + 1) where l' = if nl then l + 1 else l w' = if not last && iw then w + 1 else w
wc = foldl combine (Cdata False 0 0 0) . map cclass
main = getContents >>= putStrLn . show . wc
It seems to work in constant stack space, and gives the same answers (albeit not very beautifully) as my GNU copy of "wc".
Is the problem caused by the laziness of the Int's inside the tuple?
I'm pretty sure that's what's causing it. I had quite a search around when my version was running out of memory and everything seemed to suggest that, basically, the algorithm is building a massive list of "+1"s that only actually get executed when the you try and print the totals at the end. Any comments from more authoritative sources?
Here is the report from ghc with the '-ddump-simpl' option.
If anyone has any hints about how to read this output, I'd be grateful. It makes a bit of sense, but I don't really know what it "means". I.e. it's obviously the simplified parse tree and I can see how it relates back to the source (loosely), but attempting to figure out if something's going to be as leaky as a sieve or not is beyond me.