RE: [Haskell-cafe] using Map rather than FiniteMap

On 26 January 2005 14:29, S. Alexander Jacobson wrote:
Ah, ok. So I ran the code with 100000 items, 50000 items, and 25000 items and got total memory in use of 28Mb, 15Mb, and 8Mb respectively. That comes to 260-280 bytes per record which is still an order of magnitude higher than the 20-30 bytes per record we would expect.
When using the ordinary 2-generation collector, memory in use will tend to be 2-3 times larger than the actual residency, because this is a copying collector.
On the other hand, I found 10mb, 5mb, and 2.5mb maximum residency, but that is still ~100 bytes per record.
Right.
Lastly, I tried "example +RTS -p -K5M -hc" and then looked at the resulting graph (attachment #2) and it shows a total of 1.6Mb heap allocated and if that data is correct, it does correspond roughly to our 20-30 bytes per record estimate.
That profile doesn't include the stack, which sounds reasonable.
Regarding stack, I tried "example +RTS -p -K5M -xt -hc" and then ran hp2ps and looked at the resulting graph (attachment #1) SYSTEM appears to use 4mb of stack at the very beginning (presumably to create "zipped"), but I don't know why it would. Then this stack is consumed by the rest of the program.
Stacks never get smaller in GHC, only bigger. If you need 4Mb of stack at one point in your program, you will forever have a 4Mb stack after that (fixing this wouldn't really buy you much; the memory doesn't get traversed or copied by the GC - but one thing you could do is madvise() to tell the OS it doesn't have to page the memory, though). I haven't looked at your code in detail, hopefully someone else can shed some light on why you're building up such a large stack in the first place. Cheers, Simon

On Wed, 26 Jan 2005, Simon Marlow wrote:
When using the ordinary 2-generation collector, memory in use will tend to be 2-3 times larger than the actual residency, because this is a copying collector.
Ok. Perhaps you want a different name for this item in the reports?
On the other hand, I found 10mb, 5mb, and 2.5mb maximum residency, but that is still ~100 bytes per record.
But 100 bytes per record in a (Map Int Int) seems awefully high. A 3 record Map would be: (Bin 3 2 2 (Bin 1 1 1 Tip Tip) (Bin 1 3 3 Tip Tip)) Here is how I account for memory: size qty subtotal Ints 4 9 36 Pointers 4 7 28 Constructors 4 7 28 ----------------------------------- Total 92 Num Pairs 3 Bytes/Pair 31 Is this accounting correct? Where do the other 70 bytes/item go?
Lastly, I tried "example +RTS -p -K5M -hc" and then looked at the resulting graph (attachment #2) and it shows a total of 1.6Mb heap allocated and if that data is correct, it does correspond roughly to our 20-30 bytes per record estimate.
That profile doesn't include the stack, which sounds reasonable.
So I guess all the extra memory is actually being allocated on the stack rather than the heap which is a sign of a spaceleak. If I solve the spaceleak I get the 70 bytes per item back.
I haven't looked at your code in detail, hopefully someone else can shed some light on why you're building up such a large stack in the first place.
The code is just this 6 line program: import qualified Map zipped =zip [1..] [1..100000]::[(Int,Int)] untup f (x,y) = f x y produce = foldr (untup Map.insert) Map.empty zipped fm = length $ Map.keys produce main = print $ fm Can someone tell me what I can do to make this more efficient? -Alex- ______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com
participants (2)
-
S. Alexander Jacobson
-
Simon Marlow