
On 15 July 2004 22:08, John Meacham wrote:
On Thu, Jul 15, 2004 at 04:20:38PM +0200, Jérémy Bobbio wrote:
memcpy is available in Foreign.Marshal.Utils:
copyBytes :: Ptr a -> Ptr a -> Int -> IO ()
Copies the given number of bytes from the second area (source) into the first (destination);the copied areas may not overlap
Here is the result of a quick try to implement fast copy using it and Data.Array.Storable:
Yeah, I know I can copy areas of memory allocated via the foreign library or C code around. What I am looking for is an efficient way to work with the standard Arrays as provided by Data.Array.
The idea is that if you copy areas of an array using intermediate lists, GHC should do the appropriate deforestation to remove the lists. Whether this actually happens in practice or not is another matter - but it would help greatly if we had examples that we can investigate where the deforestation isn't happening properly. However, this still isn't going to be as fast as using memcpy. A general array-copying operating using memcpy() is entirely possible - you're allowed to pass a ByteArray# to a foreign function as long as it is marked 'unsafe'. Only a *pinned* ByteArray# can be passed to 'safe' FFI calls. I've wondered in the past whether we should pin all IOUArrays, which would make StorableArray almost obsolete (except for making arrays from Ptrs returned from FFI calls). However, having lots of small pinned IOUArrays could lead to bad memory performance, because each pinned object holds onto the page it resides in, wasting up to 4k of memory.
Also, in my tests, arrays implemented via ByteArray# or Ptr a seem to be signifigantly faster than those implemented via ForeignPtr. Is this expected?
Yes, StorableArray suffers from this problem. Specialising the array operations on StorableArray might help. Cheers, Simon

On Fri, Jul 16, 2004 at 10:43:58AM +0100, Simon Marlow wrote:
Also, in my tests, arrays implemented via ByteArray# or Ptr a seem to be signifigantly faster than those implemented via ForeignPtr. Is this expected?
Yes, StorableArray suffers from this problem. Specialising the array operations on StorableArray might help.
So, I decided to do a little testing and implemented FastMutInt in 4 ways, FastMutInt provides a fast mutable unboxed integer. MutByteArray# basically equivalant to the one used in GHC Ptr Int uses peek and poke on a malloced piece of memory ForeignPtr Int uses peek and poke on a foreignptr IORef Int just an IORef using 'seq' to be strict in its value here are the (very simple) timings run 1... ByteArray#: 191 Ptr Int: 196 ForeignPtr Int: 410 IORef Int: 340 run 2... ByteArray#: 173 Ptr Int: 174 ForeignPtr Int: 395 IORef Int: 304 so, ByteArray# seems to be equivalant to a raw pointer in speed, with the advantage that it is garbage collected. however foreignptrs are twice as slow! and even slower than an IORef. I compiled with -O2 and inlined everything to try to get rid of any overhead. as a tangent.. I have been using the counter :: Ptr Int counter = unsafePerformIO (new 0) trick to create fast global counters in performance critical stuff, it seems to work quite well. it would be nice if there were a way to allocate the memory staticaly though, because then counter could be a constant and should be much faster. perhaps something like the "foo"# :: Addr# trick? like foreign data counter 4 :: Ptr Int to reserve 4 bytes in the bss... hmm.. John -- John Meacham - ⑆repetae.net⑆john⑈

So, I was looking at the implementation of ForeignPtr's in an attempt to determine why they were slow, and have an idea to speed them up.. right now we have: ForeignPtr a = ForeignPtr ForeignObj# !(IORef [IO ()]) | MallocPtr (MutableByteArray# RealWorld) !(IORef [IO ()]) and I think the inderection caused by the disjunction is what is messing things up, not allowing ForeignPtrs to be inlined even in strict contexts as the discriminator must still be examined. so how bout something like data ForeignPtr a = ForeignPtr Addr# !FP -- note FP should be strict but BOXED [2] data FP = ForeignPtrObj ForeignObj# {-# UNPACK #-} !(IORef [IO ()]) | MallocPtr (MutableByteArray# RealWorld) {-# UNPACK #-} !(IORef [IO ()]) by caching the frequently used Addr in a place where it may be unboxed and hiding the details only needed when finalizing or garbage collecting, I think we can bring ForeignPtrs up to the speed (if not space) performance of plain old Ptr's. John [2] FP should be strict yet boxed so ForeignPtrs may be unboxed in various places without duplicating the rarely used bookkeeping info. hopefully touchForeignPtr (ForeignPtr _ fp) = IO $ \s -> case touch# fp s of s -> (# s, () #) is good enough to keep things alive. this is also why it is safe to UNPACK the IORefs in FP -- John Meacham - ⑆repetae.net⑆john⑈
participants (2)
-
John Meacham
-
Simon Marlow