
Grzegorz Chrupala wrote:
Hi all, I just noticed that a tiny change to the program which I posted recently in the "More idiomatic use of strictness" thread causes a space leak.
The code is: {-# LANGUAGE BangPatterns, PatternGuards #-} import Data.List (foldl') import Data.Char split delim s | [] <- rest = [token] | otherwise = token : split delim (tail rest) where (token,rest) = span (/=delim) s
main = do putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents
stats ws docs = foldl' f ((map (const 0) ws),0) docs where f (dfs,n) d = let dfs' = zipWith (\w df -> (df + fromEnum (w `elem` d))) ws dfs in sum dfs' `seq` (dfs',n+1)
If I change this line: putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents to this: putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words .. map toLower) getContents
suddenly the programs starts using tons of memory instead of running in small constant space. What's going on?
Answer: split "<DOC>" . words . map toLower = (:[]) . words . map toLower Since you converted everything to lowercase, the string "<DOC>" will never appear in the text, resulting in a single huge document. Furthermore, due to `elem` d , your stats function takes space proportional to the length of each document it processes. Beauty & makeup tips: putStrLn =<< fmap f getContents = putStrLn . f =<< getContents ~= interact f Here's a version with glittering nail polish that should run in constant space: split y xs = zs : case xs' of [] -> [] _:xs' -> split y xs' where (zs,xs') = break (==y) xs main = interact $ show . stats ["the","a","and"] . split "<DOC>" . words zipWith' f xs ys = zipWith f xs ys `using` rnf stats ws = foldl' (zipWith' (+)) zero . map (foldl' (zipWith' max) zero . map bits) where zero = map (const 0) ws bits v = map (fromEnum . (== v)) ws Regards, apfelmus