
jeremy.shaw:
At Tue, 5 Sep 2006 03:03:51 +0000 (UTC), John Goerzen wrote:
I have the below program, and I'm trying to run it on an input of about 90MB. It eats RAM like crazy, and I can't figure out why.
I have not looked in detail at your code -- but it could simply be the fact that String requires gobs of memory to store a string. If you forced all 90MB into memory at once, I would expect it to take almost of gig of RAM. (Around a 10-11 fold increase in size).
I suspect this line could be forcing the whole thing into memory:
wordfreq = map (\x -> (head x, length x)) . group . sort
because sort can not return the first element until it has looked at all the elements in the list to determine which one should be first.
If you fold a Data.Map or associative list over the word-list, then you could probably get the lazy behaviour you expect.
A quick hack up to use Data.ByteString uses a lot less ram, though profiling still shows 95% of time spent in the building the Map. import System.Environment import Data.Char import Data.List import qualified Data.Map as Map import qualified Data.ByteString.Char8 as B import Data.ByteString (ByteString) wordfreq inp = Map.toList $ foldl' k m inp where m = Map.empty :: Map.Map ByteString Int k n w = Map.insertWith f w 1 n f _ x = let y = x + 1 in y `seq` y freqsort (w1, c1) (w2, c2) | c1 == c2 = compare w1 w2 | otherwise = compare c2 c1 showit (w, c) = B.join (B.singleton ' ') [B.pack(show c), w] main :: IO () main = do args <- getArgs B.interact $ B.unlines . map showit . take (read . head $ args) . sortBy freqsort . wordfreq . B.words -- Don