
On Sat, 2012-07-07 at 21:13 +0200, Nicolas Trangez wrote:
As you can see, the zipWith Data.Vector.SIMD implementation is slightly slower than the Data.Vector.Storable based one. I didn't perform much profiling yet, but I suspect allocation and ForeignPtr creation is to blame, this seems to be highly optimized in GHC.ForeignPtr.mallocPlainForeignPtrBytes as used by Data.Vector.Storable.
I got the MV benchmark on-par with SV by reworking the allocation mechanism: no more FFI involved, but based on GHC.Exts.newAlignedPinnedByteArray# and some other trickery, see [1]. This could still be improved a little by using PlainPtr, but this is not exported by GHC.ForeignPtr. This did have a pretty big performance-impact on the SIMD-based benchmark, compare [2] to the old one [3]. I have no clue why the 4096 case now only uses twice the time of the 1024 one, unlike the expected 4x (+- as before). Nicolas [1] https://github.com/NicolasT/vector-simd/commit/5ec539167254435ef4e7d308706dc... [2] http://linode2.nicolast.be/files/vector-simd-xor2.html [3] http://linode2.nicolast.be/files/vector-simd-xor1.html