
From: "Edward Z. Yang"
Hello Aleksandar,
It is possible that the iteratees library is space leaking; I recall some recent discussion to this effect. Your example seems simple enough that you might recompile with a version of iteratees that has -auto-all enabled. Unfortunately, it's not really a safe bet to assume your libraries are leak free, and if you've pinpointed it down to a single line, and there doesn't seem a way to squash the leak, I'd bet it's the library's fault.
Edward
I can't reproduce the space leak here. I tried Aleksander's original code, my iteratee version, the Ngrams version posted by Johan Tibell, and a lazy bytestring version. my iteratee version (only f' has changed from Aleksander's code): f' :: Monad m => I.Iteratee S.ByteString m Wordcounts f' = I.joinI $ (enumLinesBS I.><> I.filter (not . S.null)) $ I.foldl' (\t s -> T.insertWith (+) s 1 t) T.empty my lazy bytestring version
import Data.Iteratee.Char import Data.List (foldl')import Data.Char (toLower)
import Data.Ord (comparing) import Data.List (sortBy) import System.Environment (getArgs) import qualified Data.ByteString.Lazy.Char8 as L import qualified Data.HashMap.Strict as T
f'2 = foldl' (\t s -> T.insertWith (+) s 1 t) T.empty . filter (not . L.null) . L.lines
main2 :: IO () main2 = getArgs >>= L.readFile .head >>= print . T.keys . f'2
None of these leak space for me (all compiled with ghc-7.0.3 -O2). Performance was pretty comparable for every version, although Aleksander's original did seem to have a very small edge. As someone already pointed out, keep in mind that this will use a lot of memory anyway, unless there's a lot of repetition of words. I'd be happy to help track down a space leak in iteratee, but for now I'm not seeing one. Best, John Lato