Re: SIMD/SSE support & alignment

12 Mar 2013

      Hey,

On Tue, 2013-03-12 at 14:09 +0000, Geoffrey Mainland wrote:
...
On 03/10/2013 09:52 PM, Nicolas Trangez wrote:
...
...
Hi Nicolas,
Have you read our paper about the SIMD work? It's available here:
https://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-be...
I didn't read that one before (read other stream-fusion related papers
before), but did now. I got most of it already while reading the vector
simd branch commits. Benchmarks results look very nice!

I'm afraid I didn't 'get' how the framework would allow for both AVX and
SSE instructions to work on streams, since it seems to assume Multi's
are always a fixed number of bytes wide (in this case 16 for SSE).
...
The paper describes the issues involved with integrated SIMD
instructions with the vector fusion framework.
There are two primary issues with alignment: stack alignment and heap
alignment.
We cannot rely on the stack being properly aligned for AVX spills on any
platform, and LLVM's stack fixup code does not play well with GHC, so we
*rewrite* all AVX spill instructions to their unaligned counterparts. On
Win32 we must do the same for SSE.
Does this imply stack values are always 16-byte aligned?
I haven't worked with AVX yet (my CPU doesn't support it).
...
Unboxed vectors are allocated by GHC, and it does not align memory on
16-byte boundaries, so our first cut at SSE intrinsics simply used
unaligned accesses. Obviously with ForeignPtr's we can control alignment
and potentially use the aligned variants of SSE instructions, but this
will almost double the number of primops. One could imagine extending
our fusion framework to transition to aligned move instructions.
Right. I created the patch of #7067
(http://hackage.haskell.org/trac/ghc/ticket/7067) for vector-simd
purposed back then (adding mallocForeignPtrAlignedBytes and
mallocPlainForeignPtrAlignedBytes).
...
Finally, LLVM 3.2 does not work with GHC. This means we cannot yet take
advantage of its new vectorization optimizations, which is a shame.
So, four projects for you or anyone else who is interested, in rough
dependency order:
1) Get LLVM 3.2 working with GHC's LLVM back end.
According to other mails in this thread this should be fixed. I'll give
it a go.
...
2) Fix the stack alignment issue with LLVM. This will likely require a
patch to LLVM.
I'm afraid that's a bit out of my league for now :-)
...
3) Add support for aligned move primops.
I looked into this before, might give it a stab.
...
4) Extend the current SIMD fusion framework to handle transitioning to
aligned move instructions. As an alternative, only use aligned move
instructions on memory that we know is aligned.
This is why I sent my previous mail initially: is there any plan how to
approach the 'memory that we know is aligned' bit? Would it make sense
to have a more general 'alignment restriction' framework for arbitrary
values, not only unboxed vectors (if there are any other use-cases)?
...
These are all on my todo list, but my plate is quite full at the moment.
Heh, sounds familiar ;-)

Thanks,

Nicolas