Re: simd branch ready for review

5 Feb 2013

      On 05/02/13 00:36, Geoffrey Mainland wrote:
...
On 02/04/2013 11:56 PM, Johan Tibell wrote:
...
On Mon, Feb 4, 2013 at 3:19 PM, Geoffrey Mainland  wrote:
What would a sensible fallback be for AVX instructions? What should we
fall back on when the LLVM backend is not being used?
Depends on the instruction. A 256-bit multiply could be replaced by N
multiplies etc. For popcount we have a little bit of C code in
ghc-prim that we use if SSE 4.2 isn't enabled. An alternative is to
emit some different assembly in e.g. the x86-64 backend if AVX isn't
enabled.
Maybe we could desugar AVX instructions to SSE instructions on platforms
that support SSE but not AVX, but in practice people would then #ifdef
anyway and just use SSE if AVX weren't available.
I don't follow here. If you conditionally emitted different
instructions in the backends depending on which -m flags are passed to
GHC, why would people #ifdef?
I think you are suggesting that the user should always use 256-bit
short-vector instructions, and that on platforms where AVX is not
available, this would fall back to an implementation that performed
multiple SSE instructions for each 256-bit vector instruction---and used
multiple XMM registers to hold each 256-bit vector value (or spilled).
Anyone using low-level primops should only do so if they really want
low-level control. The most efficient SSE implementation of a function
is not going to be whatever implementation falls out of a desugaring of
generic 256-bit short-vector primitives. Therefore, I suspect that
anyone using low-level vector primops like this will #ifdef and provide
two implementations---one for SSE, one for AVX. Anyone who doesn't care
about this level of detail should use a higher-level interface---which
we have already implemented---and which does not require any
ifdefs. People will #ifdef because they can provide better SSE
implementations than GHC when AVX instructions are not available.
I am suggesting that we push the "ifdefs" into a library. The vast
majority of programmers will never see the ifdefs, because they will use
the library.
I think you are suggesting that we push the "ifdefs" into GHC. That way
nobody will have a choice---they get whatever desugaring GHC gives them.
I understand your point of view---having primops that don't work
everywhere is a real pain and aesthetically unpleasing---but I prefer
exposing more low-level details in our primops even if it means a bit of
unpleasantness once in a while. This does mean a tiny segment of
programmers will have to deal with ifdefs, but I suspect that this tiny
segment of programmers would prefer ifdefs to a lack of control.
If a population count operation translates to a few extra instructions,
I don't think anyone will care. If a body of code performing
short-vector operations desugars to twice as many instructions that
require twice as many registers, thereby resulting in a bunch of extra
spills, it will matter. Put differently, there is a more-or-less
canonical desugaring of population count. For a given function using
short-vector instructions of one width, there is not a canonical
desugaring into a function using short-vector instructions of a lesser
width.
While I agree with Geoff, there's one thing we have to be careful about: 
inlining.  If the primop is exposed via an inline definition, then 
either we have to check and disable the inlining if the primop is not 
available in the current compilation, or else prevent the inlining from 
being visible in the first place.

I believe this is what Johan had in mind when he gave popcount a 
fallback.  Geoff, maybe you've thought about this already - what's the 
plan for the vector library?

Cheers,
	Simon

Re: simd branch ready for review

Simon Marlow