
Eugene Crosser wrote:
This is my program: ======== module Main where import Data.Map main = printMax . (foldr processLine empty) . lines =<< getContents processLine line map = insertWith (\new old -> new + old) line 1 map printMax map = putStrLn $ show $ foldWithKey (\key val accum -> if val > (snd accum) then (key,val) else accum) ("",0) map ======== The thing kinda works on small data sets, but if you feed it with 250,000 lines (1000 distinct), the process size grows to 200 Mb, and on 500,000 lines I get "*** Exception: stack overflow"
Your program isn't strict enough. While you expect it to keep a "running total" in the map which is updated with each new line, it really only creates lots of thunks that are only evaluated when the result is demanded. These thunks are as large as the input plus overhead. You have to force the evaluation of intermediate results. To do so, you have to replace foldr by foldl (foldr is just recursion, foldl is accumulator recursion), then use the strict variant of that, and then evaluate all values before putting them into the map. In summary, this should work (untested code, note the use of foldl'): main = printMax . (foldl' processLine empty) . lines =<< getContents processLine map line = let total = findWithDefault 0 line map + 1 in total `seq` insert line total map Yes, this is all terribly non-obvious. It takes time until you see where lazyness is going to hurt you, and you'll easily overlook some such situations. I also think, it's an unfortunate oversight that insertWith is lazy and that there's no way to make it strict as a mere user of Data.Map. Udo. -- "Guy Steele leads a small team of researchers in Burlington, Massachusetts, who are taking on an _enormous_challenge_ -- create a programming language better than Java." -- Sun.Com (emphasis by Paul Graham)