Blitting one IOUArray into another

Hi all, I'm seeing a lot of unexpected memory allocation with some simple code that copies the contents of one vector (IOUArray Int Int64) into another of the same type and dimensions. In particular, profiling reveals that `copyInto' is allocating oodles and oodles of memory. My small test case creates two 50000-element arrays and performs 10000 copies from one into the other. Since the elements are Int64s and the arrays are unboxed, each array should be 50000 elements * 8 bytes per element = 400,000 bytes and so the arrays should only take 800,000 bytes total. I understand there's some overhead for thunks and whatnot, but the profiler reported allocation is around 40 billion bytes. And almost all of that allocation is in my copying function, not in main (main allocates the arrays). I've attached two versions of the code, just for comparison. The only difference is the way the copying is done. One calls `getBounds' to figure out the bounds, and one is given the bounds for copying. They're both about the same speed and allocation (originally I was using IOUArray Int64 Int64, and there was a much greater allocation difference between the two versions; but that went away. Oh well). So, does anyone know why copying takes so much allocation? I expect there is _some_ way to just move memory from one array to another, like a memcpy -- how can I do that? Denis

Hello Denis, Wednesday, January 7, 2009, 6:56:36 PM, you wrote: memory allocated for i :))) each new copy of i needs one word. the situation was much worse with Int64, of course :)
Hi all,
I'm seeing a lot of unexpected memory allocation with some simple code that copies the contents of one vector (IOUArray Int Int64) into another of the same type and dimensions. In particular, profiling reveals that `copyInto' is allocating oodles and oodles of memory.
My small test case creates two 50000-element arrays and performs 10000 copies from one into the other. Since the elements are Int64s and the arrays are unboxed, each array should be
50000 elements * 8 bytes per element = 400,000 bytes
and so the arrays should only take 800,000 bytes total. I understand there's some overhead for thunks and whatnot, but the profiler reported allocation is around 40 billion bytes. And almost all of that allocation is in my copying function, not in main (main allocates the arrays).
I've attached two versions of the code, just for comparison. The only difference is the way the copying is done. One calls `getBounds' to figure out the bounds, and one is given the bounds for copying. They're both about the same speed and allocation (originally I was using IOUArray Int64 Int64, and there was a much greater allocation difference between the two versions; but that went away. Oh well).
So, does anyone know why copying takes so much allocation? I expect there is _some_ way to just move memory from one array to another, like a memcpy -- how can I do that? Denis
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

You can of course memcpy unboxed arrays fairly easily. bulat.ziganshin:
Hello Denis,
Wednesday, January 7, 2009, 6:56:36 PM, you wrote:
memory allocated for i :)))
each new copy of i needs one word. the situation was much worse with Int64, of course :)
Hi all,
I'm seeing a lot of unexpected memory allocation with some simple code that copies the contents of one vector (IOUArray Int Int64) into another of the same type and dimensions. In particular, profiling reveals that `copyInto' is allocating oodles and oodles of memory.
My small test case creates two 50000-element arrays and performs 10000 copies from one into the other. Since the elements are Int64s and the arrays are unboxed, each array should be
50000 elements * 8 bytes per element = 400,000 bytes
and so the arrays should only take 800,000 bytes total. I understand there's some overhead for thunks and whatnot, but the profiler reported allocation is around 40 billion bytes. And almost all of that allocation is in my copying function, not in main (main allocates the arrays).
I've attached two versions of the code, just for comparison. The only difference is the way the copying is done. One calls `getBounds' to figure out the bounds, and one is given the bounds for copying. They're both about the same speed and allocation (originally I was using IOUArray Int64 Int64, and there was a much greater allocation difference between the two versions; but that went away. Oh well).
So, does anyone know why copying takes so much allocation? I expect there is _some_ way to just move memory from one array to another, like a memcpy -- how can I do that? Denis
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ooh, great -- could you point me in the right direction for that? ________________________________________ From: Don Stewart [dons@galois.com] Sent: Wednesday, January 07, 2009 9:23 AM To: Bulat Ziganshin Cc: Bueno, Denis; haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Blitting one IOUArray into another You can of course memcpy unboxed arrays fairly easily. bulat.ziganshin:
Hello Denis,
Wednesday, January 7, 2009, 6:56:36 PM, you wrote:
memory allocated for i :)))
each new copy of i needs one word. the situation was much worse with Int64, of course :)
Hi all,
I'm seeing a lot of unexpected memory allocation with some simple code that copies the contents of one vector (IOUArray Int Int64) into another of the same type and dimensions. In particular, profiling reveals that `copyInto' is allocating oodles and oodles of memory.
My small test case creates two 50000-element arrays and performs 10000 copies from one into the other. Since the elements are Int64s and the arrays are unboxed, each array should be
50000 elements * 8 bytes per element = 400,000 bytes
and so the arrays should only take 800,000 bytes total. I understand there's some overhead for thunks and whatnot, but the profiler reported allocation is around 40 billion bytes. And almost all of that allocation is in my copying function, not in main (main allocates the arrays).
I've attached two versions of the code, just for comparison. The only difference is the way the copying is done. One calls `getBounds' to figure out the bounds, and one is given the bounds for copying. They're both about the same speed and allocation (originally I was using IOUArray Int64 Int64, and there was a much greater allocation difference between the two versions; but that went away. Oh well).
So, does anyone know why copying takes so much allocation? I expect there is _some_ way to just move memory from one array to another, like a memcpy -- how can I do that? Denis
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Oh, do you mean by actually calling memcpy via ffi? ________________________________________ From: Don Stewart [dons@galois.com] Sent: Wednesday, January 07, 2009 9:23 AM To: Bulat Ziganshin Cc: Bueno, Denis; haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Blitting one IOUArray into another You can of course memcpy unboxed arrays fairly easily. bulat.ziganshin:
Hello Denis,
Wednesday, January 7, 2009, 6:56:36 PM, you wrote:
memory allocated for i :)))
each new copy of i needs one word. the situation was much worse with Int64, of course :)
Hi all,
I'm seeing a lot of unexpected memory allocation with some simple code that copies the contents of one vector (IOUArray Int Int64) into another of the same type and dimensions. In particular, profiling reveals that `copyInto' is allocating oodles and oodles of memory.
My small test case creates two 50000-element arrays and performs 10000 copies from one into the other. Since the elements are Int64s and the arrays are unboxed, each array should be
50000 elements * 8 bytes per element = 400,000 bytes
and so the arrays should only take 800,000 bytes total. I understand there's some overhead for thunks and whatnot, but the profiler reported allocation is around 40 billion bytes. And almost all of that allocation is in my copying function, not in main (main allocates the arrays).
I've attached two versions of the code, just for comparison. The only difference is the way the copying is done. One calls `getBounds' to figure out the bounds, and one is given the bounds for copying. They're both about the same speed and allocation (originally I was using IOUArray Int64 Int64, and there was a much greater allocation difference between the two versions; but that went away. Oh well).
So, does anyone know why copying takes so much allocation? I expect there is _some_ way to just move memory from one array to another, like a memcpy -- how can I do that? Denis
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Bueno, Denis wrote:
Oh, do you mean by actually calling memcpy via ffi?
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign-Marshal-U...

On 01/07/2009 14:36 , "Neal Alexander"
Bueno, Denis wrote:
Oh, do you mean by actually calling memcpy via ffi?
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign-Marshal-U... ls.html
Ah, thanks. Is there a way to simply "cast" an IOUArray Int Int64 into something like a Ptr Int64, or will I need to change my code to allocate the arrays differently (using something in Foreign.*)? I hoogle'd functions "IOUArray a b -> Ptr b", but couldn't find anything. Denis

You can't safely convert an IOUArray into a Ptr; Ptr is a raw value which isn't noticed by the garbage collector, so if the data is relocated or GC'd while you have a pointer to it, further access will corrupt memory. Rather, the data inside of an IOUArray is held in a MutableByteArray#. In Data.Array.IO.Internals you can get at the newtype for IOUArray. I have some code that looks like this:
import Foreign.Ptr import Data.Array.Base import Data.Array.IO.Internals import GHC.Exts import Data.Word
foreign import ccall unsafe clear_bitmap :: MutableByteArray# RealWorld -> Word32 -> Word32 -> IO ()
{-# INLINE unsafeGetMutableArray# #-} unsafeGetMutableArray# :: IOUArray Int Word32 -> MutableByteArray# RealWorld unsafeGetMutableArray# (IOUArray (STUArray _ _ array#)) = array#
clearBitmap :: IOUArray Int Word32 -> Word32 -> Word32 -> IO () clearBitmap a1 color size = clear_bitmap (unsafeGetMutableArray# a1) color size
Then the I have a small amount of C code implementing "clear_bitmap": void clear_bitmap(HsWord32* img, HsWord32 color, HsWord32 size) { for(; size; --size, ++img) { *img = color; } } This is OK to do because the "unsafe" ccall guarantees that no GC can happen during the outcall to clear_bitmap, so we can manipulate the pointer directly. If you want to stay entirely in Haskell, there are a bunch of operations on MutableByteArray# in GHC.Exts; see http://www.haskell.org/ghc/docs/6.10-latest/html/libraries/ghc-prim/GHC-Prim... You probably need {-# LANGUAGE MagicHash #-} in order to get these to work; it makes # be a legal symbol in identifiers. It also helps to know the newtype for IO, if you want to write actually usable functions on top of these internal bits.
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
Of course all of this is GHC-specific, and internal to base and
subject to change. But I found it useful.
-- ryan
On Thu, Jan 8, 2009 at 6:51 AM, Bueno, Denis
On 01/07/2009 14:36 , "Neal Alexander"
wrote: Bueno, Denis wrote:
Oh, do you mean by actually calling memcpy via ffi?
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign-Marshal-U... ls.html
Ah, thanks. Is there a way to simply "cast" an IOUArray Int Int64 into something like a Ptr Int64, or will I need to change my code to allocate the arrays differently (using something in Foreign.*)?
I hoogle'd functions "IOUArray a b -> Ptr b", but couldn't find anything. Denis
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am Mittwoch, 7. Januar 2009 16:56 schrieb Bueno, Denis:
Hi all,
I'm seeing a lot of unexpected memory allocation with some simple code that copies the contents of one vector (IOUArray Int Int64) into another of the same type and dimensions. In particular, profiling reveals that `copyInto' is allocating oodles and oodles of memory.
My small test case creates two 50000-element arrays and performs 10000 copies from one into the other. Since the elements are Int64s and the arrays are unboxed, each array should be
50000 elements * 8 bytes per element = 400,000 bytes
and so the arrays should only take 800,000 bytes total. I understand there's some overhead for thunks and whatnot, but the profiler reported allocation is around 40 billion bytes. And almost all of that allocation is in my copying function, not in main (main allocates the arrays).
I think you've run into a profiling/optimising incopatibility. Compiling the code just with -O2 --make and running with -sstderr, both report 899,944 bytes allocated in the heap 1,272 bytes copied during GC (scavenged) 0 bytes copied during GC (not scavenged) 16,384 bytes maximum residency (1 sample(s)) 2 collections in generation 0 ( 0.00s) 1 collections in generation 1 ( 0.00s) 2 Mb total memory in use which looks reasonable, but it is dog slow on my box, 26.7/26.9 seconds :( Compiled with -prof -auto-all -O2 --make, both allocate madly (~40G) and take nearly ten times as long. It is known that profiling and optimising don't mix too well, but this is remarkable, maybe it's worth an investigation. Cheers, Daniel
participants (6)
-
Bueno, Denis
-
Bulat Ziganshin
-
Daniel Fischer
-
Don Stewart
-
Neal Alexander
-
Ryan Ingram