
If you have a giant unboxed array that will never become garbage, it would be nice to put it somewhere where the GC won't bother with it. Since Data.Array.Storable arrays are allocated in the C heap, I thought it would be a good choice. However, I am getting very poor performance due to the GC copying 6G in each run. The only explanation that I can think of is that it is copying my giant array. Am I wrong in my understanding that Data.Array.Storable arrays should not be copied by the GC? BTW, I am using GHC 6.10.1 on linux. Thanks for reading.

Hello David, Tuesday, November 25, 2008, 4:45:28 PM, you wrote:
However, I am getting very poor performance due to the GC copying 6G in each run. The only explanation that I can think of is that it is copying my giant array.
each GC run? each program run? try to increase size of your array and check how many data are now copied by GC (of course, everything else shouldn't be changed) it's possible that your code that fills an array creates a lot of intermediate data -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, 2008-11-25 at 16:51 +0300, Bulat Ziganshin wrote:
Hello David,
Hello Bulat,
Tuesday, November 25, 2008, 4:45:28 PM, you wrote:
However, I am getting very poor performance due to the GC copying 6G in each run. The only explanation that I can think of is that it is copying my giant array.
each GC run? each program run?
each program run
try to increase size of your array and check how many data are now copied by GC (of course, everything else shouldn't be changed)
Excellent idea. I made the array twice as big, Otherwise everything else is the same. The GC copies the same amount of data. The only difference I noticed is that with the array twice as big, there were about half as many collections in generation 1. The time and efficiency are about the same. I guess it is not the fault of Data.Array.Storable. When I was researching how to do this, I was really hoping for something like "static areas" from the Lisp Machine operating system. You could allocate any normal object in an area of the heap where the GC would not bother with it. I miss that. I wonder why GHC doesn't have such a concept?

Hello David, Tuesday, November 25, 2008, 5:27:51 PM, you wrote:
When I was researching how to do this, I was really hoping for something like "static areas" from the Lisp Machine operating system. You could allocate any normal object in an area of the heap where the GC would not bother with it. I miss that. I wonder why GHC doesn't have such a concept?
it has. bytestrings use this area, it's called "pinned arrays" -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, 2008-11-25 at 17:32 +0300, Bulat Ziganshin wrote:
Hello David,
Tuesday, November 25, 2008, 5:27:51 PM, you wrote:
When I was researching how to do this, I was really hoping for something like "static areas" from the Lisp Machine operating system. You could allocate any normal object in an area of the heap where the GC would not bother with it. I miss that. I wonder why GHC doesn't have such a concept?
it has. bytestrings use this area, it's called "pinned arrays"
Thanks. I'll look into it. BTW, running my program with the compacting GC improved efficiency dramatically. Your initial idea that it was my other data was my other data was absolutely correct.

GHC has 'pinned arrays' that have this behavior. however, you probably don't want to use them as they simply give the garbage collector less choices about what to do possibly decreasing its efficiency. The garbage collector already is free to not copy arrays if it feels it isn't worth it, by pinning them you simply take away its ability to choose to do so if it is needed. John -- John Meacham - ⑆repetae.net⑆john⑈

John Meacham wrote:
GHC has 'pinned arrays' that have this behavior. however, you probably don't want to use them as they simply give the garbage collector less choices about what to do possibly decreasing its efficiency. The garbage collector already is free to not copy arrays if it feels it isn't worth it, by pinning them you simply take away its ability to choose to do so if it is needed.
To be a little more concrete, all arrays larger than ~3k are effectively pinned in GHC right now, as in they are never copied. If the array is unboxed, then it is never traversed by the GC either, so large unboxed arrays have basically zero GC cost. There's no need for any help from the programmer, it's done automatically by the GC. For smaller arrays, as John says there's a tradeoff in whether to pin them or not. Pinning avoids copying in the GC, but might lead to fragmentation. Pinning is necessary if you want to pass the address of the memory to an FFI call at any point, which is why bytestring pins its arrays. Cheers, Simon
participants (4)
-
Bulat Ziganshin
-
David F. Place
-
John Meacham
-
Simon Marlow