
12 Mar
2013
12 Mar
'13
11:01 a.m.
On Tue, Mar 12, 2013 at 7:09 AM, Geoffrey Mainland
Unboxed vectors are allocated by GHC, and it does not align memory on 16-byte boundaries, so our first cut at SSE intrinsics simply used unaligned accesses. Obviously with ForeignPtr's we can control alignment and potentially use the aligned variants of SSE instructions, but this will almost double the number of primops. One could imagine extending our fusion framework to transition to aligned move instructions.
When I implemented the memcpy primops I added an optional alignment parameter (rather than a new primop) to each primop. LLVM uses the same setup. Perhaps it could work for you? -- Johan