
Am Mittwoch, 7. Januar 2009 16:56 schrieb Bueno, Denis:
Hi all,
I'm seeing a lot of unexpected memory allocation with some simple code that copies the contents of one vector (IOUArray Int Int64) into another of the same type and dimensions. In particular, profiling reveals that `copyInto' is allocating oodles and oodles of memory.
My small test case creates two 50000-element arrays and performs 10000 copies from one into the other. Since the elements are Int64s and the arrays are unboxed, each array should be
50000 elements * 8 bytes per element = 400,000 bytes
and so the arrays should only take 800,000 bytes total. I understand there's some overhead for thunks and whatnot, but the profiler reported allocation is around 40 billion bytes. And almost all of that allocation is in my copying function, not in main (main allocates the arrays).
I think you've run into a profiling/optimising incopatibility. Compiling the code just with -O2 --make and running with -sstderr, both report 899,944 bytes allocated in the heap 1,272 bytes copied during GC (scavenged) 0 bytes copied during GC (not scavenged) 16,384 bytes maximum residency (1 sample(s)) 2 collections in generation 0 ( 0.00s) 1 collections in generation 1 ( 0.00s) 2 Mb total memory in use which looks reasonable, but it is dog slow on my box, 26.7/26.9 seconds :( Compiled with -prof -auto-all -O2 --make, both allocate madly (~40G) and take nearly ten times as long. It is known that profiling and optimising don't mix too well, but this is remarkable, maybe it's worth an investigation. Cheers, Daniel