
Dan Doel wrote:
Issue 2: Reading from/writing to a MutableByteArray# is slower than an Addr#
This is, I think, the crux of the issue. The main content of the benchmark is reversing/shifting items in an array. To get a somewhat easier look at the core, I boiled things down to a benchmark that just reverses a small array many times. In the interest of further reducing things, I wrote a version of the benchmark that uses raw Addr#s, and a version that uses raw MutableByteArray#s. I've attached both versions.
So I tried your examples and the Addr# version looks slower than the MBA# version: $ ./Ptr 100 1000000 +RTS -sstderr Done. 48,196,560 bytes allocated in the heap 27,381,764 bytes copied during GC (scavenged) 18,260,784 bytes copied during GC (not scavenged) 14,389,248 bytes maximum residency (5 sample(s)) 92 collections in generation 0 ( 0.09s) 5 collections in generation 1 ( 0.13s) 28 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.68s ( 0.69s elapsed) GC time 0.22s ( 0.28s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.90s ( 0.97s elapsed) $ ./ByteArr 100 1000000 +RTS -sstderr Done. 4,042,700 bytes allocated in the heap 1,272 bytes copied during GC (scavenged) 0 bytes copied during GC (not scavenged) 16,384 bytes maximum residency (1 sample(s)) 2 collections in generation 0 ( 0.00s) 1 collections in generation 1 ( 0.00s) 5 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.53s ( 0.54s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.53s ( 0.54s elapsed) I tried with 6.8.2 and 6.8.3, using -O2 in both cases. I tried the Ptr version with and without -fvia-C -optc-O2, no difference. Are these exactly the same programs you measured? What parameters did you use? Cheers, Simon