
On Tuesday 17 June 2008, Daniel Fischer wrote:
I've experimented a bit and found that Ptr is faster for small arrays (only very slightly so if compiled with -fvia-C -optc-O3), but ByteArr performs much better for larger arrays ... The GC time for the Addr# version is frightening
I had an entire e-mail written about what a bizarre and interesting result you'd just found, but unfortunately, I then remembered exactly how the array gets filled in the Ptr version. Namely: (Ptr a) <- newArray [0..n-1] Which, I assume does something terrible, like calling length to get the size needed for the array, while also needing the values after array creation to feed into it. For small arrays like the ones I'd been testing with, this doesn't matter, because the work done on the list is negligible. However, when you get to very large lists (100,000 elements and above, apparently) this starts causing massive space leaks, which explains the terrible GC behavior we were seeing. If you change the benchmark like so: bench :: Int -> Int -> IO () bench k n = do (Ptr a) <- mallocArray n :: IO (Ptr Int) fill a n replicateM_ k (reverse a 0 (n-1)) fill :: Addr# -> Int -> IO () fill a (I# n) = IO (go 0#) where go i s | i <# n = case writeIntOffAddr# a i i s of s -> go (i +# 1#) s | otherwise = (# s, () #) The space leak goes away, and the runtimes stay consistent. Up to around 10,000 elements, Ptr hovers around 6s, and ByteArray (-fasm) stays around 11. At 100,000, Ptr jumps to 12-13s, and ByteArray goes to 14-16 and stays there (again, I imagine due to running into bad cache effects at that level). This is all for size * iterations = 2.5 billion on my machine. A good catch anyhow, though. That could explain why Simon Marlow was seeing the Addr# version as slower, since he was using a large array, and thus work done on the list could have contributed significantly (although MUT time was higher with Ptr, so it would have had to contribute work there, not just GC thrashing). Thanks for the input, -- Dan