
Hey, On Tue, 2013-03-12 at 14:09 +0000, Geoffrey Mainland wrote:
On 03/10/2013 09:52 PM, Nicolas Trangez wrote:
...
Hi Nicolas,
Have you read our paper about the SIMD work? It's available here:
https://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-be...
I didn't read that one before (read other stream-fusion related papers before), but did now. I got most of it already while reading the vector simd branch commits. Benchmarks results look very nice! I'm afraid I didn't 'get' how the framework would allow for both AVX and SSE instructions to work on streams, since it seems to assume Multi's are always a fixed number of bytes wide (in this case 16 for SSE).
The paper describes the issues involved with integrated SIMD instructions with the vector fusion framework.
There are two primary issues with alignment: stack alignment and heap alignment.
We cannot rely on the stack being properly aligned for AVX spills on any platform, and LLVM's stack fixup code does not play well with GHC, so we *rewrite* all AVX spill instructions to their unaligned counterparts. On Win32 we must do the same for SSE.
Does this imply stack values are always 16-byte aligned? I haven't worked with AVX yet (my CPU doesn't support it).
Unboxed vectors are allocated by GHC, and it does not align memory on 16-byte boundaries, so our first cut at SSE intrinsics simply used unaligned accesses. Obviously with ForeignPtr's we can control alignment and potentially use the aligned variants of SSE instructions, but this will almost double the number of primops. One could imagine extending our fusion framework to transition to aligned move instructions.
Right. I created the patch of #7067 (http://hackage.haskell.org/trac/ghc/ticket/7067) for vector-simd purposed back then (adding mallocForeignPtrAlignedBytes and mallocPlainForeignPtrAlignedBytes).
Finally, LLVM 3.2 does not work with GHC. This means we cannot yet take advantage of its new vectorization optimizations, which is a shame.
So, four projects for you or anyone else who is interested, in rough dependency order:
1) Get LLVM 3.2 working with GHC's LLVM back end.
According to other mails in this thread this should be fixed. I'll give it a go.
2) Fix the stack alignment issue with LLVM. This will likely require a patch to LLVM.
I'm afraid that's a bit out of my league for now :-)
3) Add support for aligned move primops.
I looked into this before, might give it a stab.
4) Extend the current SIMD fusion framework to handle transitioning to aligned move instructions. As an alternative, only use aligned move instructions on memory that we know is aligned.
This is why I sent my previous mail initially: is there any plan how to approach the 'memory that we know is aligned' bit? Would it make sense to have a more general 'alignment restriction' framework for arbitrary values, not only unboxed vectors (if there are any other use-cases)?
These are all on my todo list, but my plate is quite full at the moment.
Heh, sounds familiar ;-) Thanks, Nicolas