Re: LLVM calling convention for AVX2 and AVX512 registers

10 Mar 2017

      If we only turn on ymm and zmm for passing explicit 256bit and 512bit
vector types then changing the ABI would have basically zero effect on any
code anybody is actually using today. Everything would remain abi
compatible unless it involves the new types that nobody is using.

This also has the benefit that turning on avx2 or avx512 wouldn't change
the calling convention of any code, making it much safer to link code
compiled with it on with code compiled with it off. That seems like a big
deal.

Moreover, if we start passing normal floats, etc. through them then our
lack of shuffles and ways to get data in/out of them becomes quite a pain
point.

As for passing int/word data, passing the vectors of them through the ymm
and zmm registers should be sufficient for the same reasons.

-Edward

On Thu, Mar 9, 2017 at 3:55 PM, Carter Schonwald 
...
wrote:
...
zooming out:
what *should* the new ABI be?
Ed was suggesting we make all 16 xmm/ymm/ lower 16 zmm registers
(depending on how they're being used) caller save,
(what about all 32 zmm registers? would they be float only, or also for
ints/words? simd has lots of nice int support!)
a) if this doesn't cause any perf regressions i've no objections
b) currently we only support passing floats/doubles and simd vectors of ,
do we wanna support int/word data there too? (or are the GPR / general
purpose registers enough for those? )
c) other stuff i'm probably overlooking
d) lets do this!
On Thu, Mar 9, 2017 at 3:31 PM, Carter Schonwald <
carter.schonwald@gmail.com> wrote:
...
the patch is still on TRAC,
https://ghc.haskell.org/trac/ghc/ticket/8033
we need to do changes to both the 32bit and 64bit ABIs, and I think thats
where I got stalled from lack of feedback
that aside:
heres the original email thread on the llvm commits thread
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-
20130708/180264.html
and theres links from there to the iterating on the test suite plus the
original patch
i'm more than happy to take a weekend to do the leg work, it was pretty
fun last time.
BUT, we need to agree on what ABI to do, and make sure that those ABI
changes dont create a performance regression for some unexpected reason.
On Thu, Mar 9, 2017 at 3:11 PM, Geoffrey Mainland 
wrote:
...
We would need to get a patch to LLVM accepted to change the GHC calling
convention.
Now that we commit to a particular version of LLVM, this might be less
of an issue than it once was since we wouldn't have to support versions
of LLVM that didn't support the new calling convention.
So...how do we get a patch into LLVM? I believe I once had such a patch
ready to go...I will dig around for it, but the change is very small and
easily recreated.
It would be even better if we could *also* teach the native back end
about SSE instructions. Is there anyone who might be willing to work on
that?
Geoff
On 3/9/17 2:30 PM, Edward Kmett wrote:
...
Back around 2013, Geoff raised a discussion about fixing up the GHC
ABI so that the LLVM calling convention could pass 256 bit vector
types in YMM (and, i suppose now 512 bit vector types in ZMM).
As I recall, this was blocked by some short term concerns about which
LLVM release was imminent or what have you. Four years on, the exact
same sort of arguments could be dredged up, but yet in the meantime
nobody is really using those types for anything.
This still creates a pain point around trying to use these wide types
today. Spilling rather than passing them in registers adds a LOT of
overhead to any attempt to use them that virtually erases any benefit
to having them in the first place.
I started experimenting with writing some custom primops directly in
llvm so I could do meaningful amounts of work with our SIMD vector
types by just banging out the code that we can't write in haskell
directly using llvm assembly, and hoping I could trick LLVM to do link
time optimization to perhaps inline it, but I'm basically dead in the
water over the overhead of our current calling convention, before I
even start, it seems, as if we're spilling them there is no way that
inlining / LTO could hope to figure out what we're doing out as part
of the spill to erase that call entirely.
It is rather frustrating that I can't even cheat. =/
What do we need to do to finally fix this?
-Edward