Hello Aleksandar,
It is possible that the iteratees library is space leaking; I recall some
recent discussion to this effect. Your example seems simple enough that
you might recompile with a version of iteratees that has -auto-all enabled.
Unfortunately, it's not really a safe bet to assume your libraries are
leak free, and if you've pinpointed it down to a single line, and there
doesn't seem a way to squash the leak, I'd bet it's the library's fault.
Edward
I can't reproduce the space leak here. I tried Aleksander's original code, my iteratee version, the Ngrams version posted by Johan Tibell, and a lazy bytestring version.
my iteratee version (only f' has changed from Aleksander's code):
f' :: Monad m => I.Iteratee S.ByteString m Wordcounts
f' = I.joinI $ (enumLinesBS I.><> I.filter (not . S.null)) $ I.foldl' (\t s -> T.insertWith (+) s 1 t) T.empty
my lazy bytestring version
> import Data.Iteratee.Char
> import Data.List (foldl')import Data.Char (toLower)
>
> import Data.Ord (comparing)
> import Data.List (sortBy)
> import System.Environment (getArgs)
> import qualified Data.ByteString.Lazy.Char8 as L
> import qualified Data.HashMap.Strict as T
>
> f'2 = foldl' (\t s -> T.insertWith (+) s 1 t) T.empty . filter (not . L.null) . L.lines
>
> main2 :: IO ()
> main2 = getArgs >>= L.readFile .head >>= print . T.keys . f'2
None of these leak space for me (all compiled with ghc-7.0.3 -O2). Performance was pretty comparable for every version, although Aleksander's original did seem to have a very small edge.