
On 03/12/2013 03:01 PM, Johan Tibell wrote:
On Tue, Mar 12, 2013 at 7:09 AM, Geoffrey Mainland
wrote: Unboxed vectors are allocated by GHC, and it does not align memory on 16-byte boundaries, so our first cut at SSE intrinsics simply used unaligned accesses. Obviously with ForeignPtr's we can control alignment and potentially use the aligned variants of SSE instructions, but this will almost double the number of primops. One could imagine extending our fusion framework to transition to aligned move instructions.
When I implemented the memcpy primops I added an optional alignment parameter (rather than a new primop) to each primop. LLVM uses the same setup. Perhaps it could work for you?
-- Johan
LLVM needs to know statically whether or not an SSE move is aligned---it can't be computed at runtime. I don't think passing an extra Int# argument (or whatever) to a primop is going to work. Geoff