On 11.10.2010 17:54, Daniel Fischer wrote:
Not only is the actual memory consumption quite high, but the GC/computation ratio is almost 1:1. You construct a largish map (about 350000 entries), doing a lot (more than a million) of insertWith's. That means frequent rebalancing (for new circumferences) and many updates of already present values (which in all likelihood are heap allocated Ints), both give the garbage collector a lot of work. Further, you construct a lot of lists, which need to be GCed too. Also, the memory overhead of Maps is nontrivial, I think five or six words
On Monday 11 October 2010 16:14:11, Thorsten Hater wrote: per item, so a Map Int Int needs about size*8(or 7)*sizeof(Int) bytes.
You can reduce the GC time by specifying a larger allocation area (e.g. +RTS -A16M -RTS, the default size is 512K) thus making GC less frequent.
I assume that there is something to be done regarding strictness, Nothing obvious. But for the computation pattern you use, since the ratio of possible values to occurring values is not too high, using accumArray (on an Unboxed array) is far better.
but a series of experiments with BangPatterns return no yields at all (using foldr instead of foldl' produces stack overflow) Yes, using foldr with insert(With)s into Maps is a Bad Idea™
Thank you for pointing me to Unboxed Arrays and accumarray, it works like a charm. The new statistics: 35,180,328 bytes allocated in the heap 13,608 bytes copied during GC 12,027,912 bytes maximum residency (2 sample(s)) 594,520 bytes maximum slop 13 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 43 collections, 0 parallel, 0.00s, 0.00s elapsed Generation 1: 2 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.14s ( 0.17s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.14s ( 0.17s elapsed) %GC time 0.0% (0.6% elapsed) Alloc rate 251,288,057 bytes per MUT second Productivity 100.0% of total user, 82.9% of total elapsed I don't think there is much left to do in terms of optimization. So I'm left with the question of when to use Data.Map, if accumarray is so much better even for heavy number crunching as this. Especially when Data.Map advertises as efficient. (Even without Unboxed, normal Arrays beat Maps by a factor of 2) Best regards.