
Hi all, I just noticed that a tiny change to the program which I posted recently in the "More idiomatic use of strictness" thread causes a space leak. The code is: {-# LANGUAGE BangPatterns, PatternGuards #-} import Data.List (foldl') import Data.Char split delim s | [] <- rest = [token] | otherwise = token : split delim (tail rest) where (token,rest) = span (/=delim) s main = do putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents stats ws docs = foldl' f ((map (const 0) ws),0) docs where f (dfs,n) d = let dfs' = zipWith (\w df -> (df + fromEnum (w `elem` d))) ws dfs in sum dfs' `seq` (dfs',n+1) If I change this line: putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents to this: putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words . map toLower) getContents suddenly the programs starts using tons of memory instead of running in small constant space. What's going on? Best, -- Grzegorz -- View this message in context: http://www.nabble.com/Strange-space-leak-tp18439685p18439685.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Grzegorz Chrupala wrote:
Hi all, I just noticed that a tiny change to the program which I posted recently in the "More idiomatic use of strictness" thread causes a space leak.
The code is: {-# LANGUAGE BangPatterns, PatternGuards #-} import Data.List (foldl') import Data.Char split delim s | [] <- rest = [token] | otherwise = token : split delim (tail rest) where (token,rest) = span (/=delim) s
main = do putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents
stats ws docs = foldl' f ((map (const 0) ws),0) docs where f (dfs,n) d = let dfs' = zipWith (\w df -> (df + fromEnum (w `elem` d))) ws dfs in sum dfs' `seq` (dfs',n+1)
If I change this line: putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents to this: putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words .. map toLower) getContents
suddenly the programs starts using tons of memory instead of running in small constant space. What's going on?
Answer: split "<DOC>" . words . map toLower = (:[]) . words . map toLower Since you converted everything to lowercase, the string "<DOC>" will never appear in the text, resulting in a single huge document. Furthermore, due to `elem` d , your stats function takes space proportional to the length of each document it processes. Beauty & makeup tips: putStrLn =<< fmap f getContents = putStrLn . f =<< getContents ~= interact f Here's a version with glittering nail polish that should run in constant space: split y xs = zs : case xs' of [] -> [] _:xs' -> split y xs' where (zs,xs') = break (==y) xs main = interact $ show . stats ["the","a","and"] . split "<DOC>" . words zipWith' f xs ys = zipWith f xs ys `using` rnf stats ws = foldl' (zipWith' (+)) zero . map (foldl' (zipWith' max) zero . map bits) where zero = map (const 0) ws bits v = map (fromEnum . (== v)) ws Regards, apfelmus

apfelmus wrote:
Answer:
split "<DOC>" . words . map toLower = (:[]) . words . map toLower
Since you converted everything to lowercase, the string "<DOC>" will never appear in the text, resulting in a single huge document.
Oops, that should have been obvious, sorry for the dumb question. Thanks, -- Grzegorz -- View this message in context: http://www.nabble.com/Strange-space-leak-tp18439685p18444159.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Grzegorz Chrupala wrote:
split "<DOC>" . words . map toLower = (:[]) . words . map toLower
Since you converted everything to lowercase, the string "<DOC>" will never appear in the text, resulting in a single huge document.
Oops, that should have been obvious, sorry for the dumb question. Thanks,
No problem, it was not obvious to me and I had fun trying to figure it out :) Regards, apfelmus

apfelmus wrote:
Grzegorz Chrupala wrote:
split "<DOC>" . words . map toLower = (:[]) . words . map toLower
Since you converted everything to lowercase, the string "<DOC>" will never appear in the text, resulting in a single huge document.
Oops, that should have been obvious, sorry for the dumb question. Thanks,
No problem, it was not obvious to me and I had fun trying to figure it out :)
Speaking of not obvious: Haskell's type system catches a lot of bugs -- but still gives no help with this particular 'problem'. But one can easily imagine an extension to a type system which could have detected that "<DOC>" can never occur in the result of words . map toLower, and then with a bit more work [type-level Nat], the type of the full expression could have encoded that the result is always going to be of length 1. That would surely have been a good hint that something non-trivial was going on. Whether a Haskell-friendly type system extension could be created/implemented which would cover this example, I don't know. However, I have had a lot of fun with the underlying idea: anytime someone encounters a bug in their code (and relates the debugging story on haskell-cafe), try to imagine how the type system could be extended to automate that. In most cases, I don't mean to have the type system reject the code, but rather to have an inferred type that would make it obvious that the code did not behave as expected. Jacques
participants (3)
-
apfelmus
-
Grzegorz Chrupala
-
Jacques Carette