
Hello everyone, I'm trying to reduce the memory footprint (and possibly execution time, too) of a small program I wrote. It's from the project Euler context, produces the correct result, but it is unsatifiing as indicated by a profiling run: 2,633,675,888 bytes allocated in the heap 1,221,377,976 bytes copied during GC 27,240,696 bytes maximum residency (22 sample(s)) 1,576,200 bytes maximum slop 80 MB total memory in use (1 MB lost due to fragmentation) Generation 0: 5002 collections, 0 parallel, 1.38s, 1.36s elapsed Generation 1: 22 collections, 0 parallel, 0.62s, 0.64s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 1.76s ( 1.80s elapsed) GC time 1.99s ( 2.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.76s ( 3.81s elapsed) %GC time 53.1% (52.6% elapsed) Alloc rate 1,494,074,809 bytes per MUT second Productivity 46.9% of total user, 46.3% of total elapsed Not only is the actual memory consumption quite high, but the GC/computation ratio is almost 1:1. I assume that there is something to be done regarding strictness, but a series of experiments with BangPatterns return no yields at all (using foldr instead of foldl' produces stack overflow) Here is my program: -- project euler p75.hs import qualified Data.Map as Map import Data.List -- strict foldl circum :: Int -> Int -> Int circum a b = 2*a*(a+b) count :: Int -> (Map.Map Int Int) -> Int -> (Map.Map Int Int) count lmt mp c = foldl' step mp $ takeWhile (<=lmt) [k*c | k <- [1..] ] where step m x = Map.insertWith' (+) x 1 m solve :: Int -> Int solve l = Map.size $ Map.filter (== 1) primitive where lmt = floor $ sqrt $ (fromIntegral l)/2 cand = [c | m <- [2..lmt], n <- [1..m], odd (m+n), 1 == gcd m n, let c = circum m n, l >= c] primitive = foldl' (count l) (Map.empty) cand main = do print $ solve 1500000 Any advice and or tips welcome, as I'm quite new to Haskell. Best regards.

On Monday 11 October 2010 16:14:11, Thorsten Hater wrote:
Not only is the actual memory consumption quite high, but the GC/computation ratio is almost 1:1.
You construct a largish map (about 350000 entries), doing a lot (more than a million) of insertWith's. That means frequent rebalancing (for new circumferences) and many updates of already present values (which in all likelihood are heap allocated Ints), both give the garbage collector a lot of work. Further, you construct a lot of lists, which need to be GCed too. Also, the memory overhead of Maps is nontrivial, I think five or six words per item, so a Map Int Int needs about size*8(or 7)*sizeof(Int) bytes. You can reduce the GC time by specifying a larger allocation area (e.g. +RTS -A16M -RTS, the default size is 512K) thus making GC less frequent.
I assume that there is something to be done regarding strictness,
Nothing obvious. But for the computation pattern you use, since the ratio of possible values to occurring values is not too high, using accumArray (on an Unboxed array) is far better.
but a series of experiments with BangPatterns return no yields at all (using foldr instead of foldl' produces stack overflow)
Yes, using foldr with insert(With)s into Maps is a Bad Idea™

On 11.10.2010 17:54, Daniel Fischer wrote:
Not only is the actual memory consumption quite high, but the GC/computation ratio is almost 1:1. You construct a largish map (about 350000 entries), doing a lot (more than a million) of insertWith's. That means frequent rebalancing (for new circumferences) and many updates of already present values (which in all likelihood are heap allocated Ints), both give the garbage collector a lot of work. Further, you construct a lot of lists, which need to be GCed too. Also, the memory overhead of Maps is nontrivial, I think five or six words
On Monday 11 October 2010 16:14:11, Thorsten Hater wrote: per item, so a Map Int Int needs about size*8(or 7)*sizeof(Int) bytes.
You can reduce the GC time by specifying a larger allocation area (e.g. +RTS -A16M -RTS, the default size is 512K) thus making GC less frequent.
I assume that there is something to be done regarding strictness, Nothing obvious. But for the computation pattern you use, since the ratio of possible values to occurring values is not too high, using accumArray (on an Unboxed array) is far better.
but a series of experiments with BangPatterns return no yields at all (using foldr instead of foldl' produces stack overflow) Yes, using foldr with insert(With)s into Maps is a Bad Idea™
Thank you for pointing me to Unboxed Arrays and accumarray, it works like a charm. The new statistics: 35,180,328 bytes allocated in the heap 13,608 bytes copied during GC 12,027,912 bytes maximum residency (2 sample(s)) 594,520 bytes maximum slop 13 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 43 collections, 0 parallel, 0.00s, 0.00s elapsed Generation 1: 2 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.14s ( 0.17s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.14s ( 0.17s elapsed) %GC time 0.0% (0.6% elapsed) Alloc rate 251,288,057 bytes per MUT second Productivity 100.0% of total user, 82.9% of total elapsed I don't think there is much left to do in terms of optimization. So I'm left with the question of when to use Data.Map, if accumarray is so much better even for heavy number crunching as this. Especially when Data.Map advertises as efficient. (Even without Unboxed, normal Arrays beat Maps by a factor of 2) Best regards.

On Monday 11 October 2010 23:29:30, Thorsten Hater wrote:
Thank you for pointing me to Unboxed Arrays and accumarray, it works like a charm. The new statistics:
35,180,328 bytes allocated in the heap 13,608 bytes copied during GC 12,027,912 bytes maximum residency (2 sample(s)) 594,520 bytes maximum slop 13 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 43 collections, 0 parallel, 0.00s, 0.00s elapsed Generation 1: 2 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed) MUT time 0.14s ( 0.17s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.14s ( 0.17s elapsed)
%GC time 0.0% (0.6% elapsed)
Alloc rate 251,288,057 bytes per MUT second
Productivity 100.0% of total user, 82.9% of total elapsed
Yes, that's more like it :)
I don't think there is much left to do in terms of optimization.
Generate the primitive triangles by a more efficient method, all those gcd's take a lot of time.
So I'm left with the question of when to use Data.Map, if accumarray is so much better even for heavy number crunching as this.
That's not really heavy number crunching, it's a pique-nique. There are several considerations. - sparse (irregular) data If only one in 100 possible data points actually occurs, an array is a huge waste of space. You quickly lose cache-locality then, and Maps are often faster in those cases. - unknown bounds That rules out arrays. - pattern of computation For this problem, accumArray is perfectly suited, but that is not always the case. Consider problems where you update your Map using previously stored data, then accumArray isn't applicable. In such cases Maps give you nice code and usually decent performance, although in most cases if your data is not sparse, using an unboxed mutable array (STUArray) if possible gives you much better performance - but at the cost of uglier code. A boxed mutable array (STArray) will in many cases also perform better than a Map, but usually not as well as an unboxed one. Of course, arrays can't be used (or only with great convolutions) for stuff like Map String a, where the key can't be mapped easily to an array index.
Especially when Data.Map advertises as efficient.
It is efficient for what it does (of course, there's always room for improvement), but it's not the appropriate data structure for everything. By default, Haskell's data structures are immutable, thus if you modify a Map, you get a new one, which typically requires copying of O(log size) nodes (and the bulk of the Map is shared, otherwise performance would be abysmal). But still, that is a nontrivial operation, much more costly than modifying a mutable map in an impure language (where such a modification is often just relinking a few pointers). So if you have an algorithm that requires frequent updates, the cost of these operations makes itself felt. But what happens if you use an array instead of a Map? If the keys map easily to array indices, that is very easy to do, but if you use immutable arrays, each update requires copying the entire array (in case of boxed arrays only the pointers to the values are copied, but still, allocating a new memory block and copying 10 million pointers likely takes much longer than copying 20-25 nodes of a Map). If you use a mutable array, such a modification is cheap, only one array slot needs to be changed. But that ties you to a mutability monad (ST s or IO, mostly), which has its downsides too. And if data is sparse, you again waste a lot of space. And if the keys don't map easily to indices, well, you can implement a Map as an Array Int (Key,Value) or a pair (Array Int Key, Array Int Value), keeping the entries sorted by the keys. In fact, if you build the map once and use it later for a lot of lookups, that is often better than a Map (regarding performance). Both give you O(log size) lookup, but the array solution can have less space overhead. However, inserting a new element or deleting an element becomes an O(size) operation, so if you have to do that a lot, this is a terrible solution, you should use a Map.
(Even without Unboxed, normal Arrays beat Maps by a factor of 2) Best regards.
participants (2)
-
Daniel Fischer
-
Thorsten Hater