Re: llvm calling convention matters

11 Sep 2013

      Can you provide an example of the kind of ABI change you might want for
7.10? Is it mainly using more registers to pass arguments? We're already
using 6 *mm* registers to pass arguments on x86_64. I don't know for
sure, but I would be very surprised if there is code out there that
would benefit greatly from passing more than 6 Float/Double/SIMD vector
arguments in registers.

Without understanding the ABI design space you have in mind, I can't
comment on how changing the ABI now would or would not make future
exploration more difficult.

I don't see why we should limit ourselves by insisting that the gap
between what the LLVM back-end and the native back-end not grow further.
If we want SIMD, the gap is already quite large. Yes it would be nice to
have feature parity, but there are only so many man-hours available, and
we want to invest them wisely. The SIMD primops already do not work on
the native codegen; the user gets an error telling them to use the LLVM
back-end if they use the SIMD primops with the native codegen.

I was not suggesting that we require LLVM 3.4 or later for this or any
future version of GHC. Instead, the ABI would change based on the
version of LLVM used. I think that is unavoidable at this point and not
a huge deal as it would only affect SIMD code.

All this said, I'm not going to push. Changing the ABI just creates more
work for me. I'm very motivated to get the rest of the SIMD patches into
HEAD before I present our SIMD paper at ICFP in a few weeks. However, a
year from now my priorities will likely be very different, so the ball
will be entirely in your (or someone else's, just not my!) court.

Geoff

On 09/11/2013 06:26 PM, Carter Schonwald wrote:
...
hey all,
first let me preface by saying I am in favor of breaking and
updating/modernizing the GHC ABI.
I just think that for a number of reasons, it doesn't make sense to do
it for the 7.8 release, but rather start work on it in another month
or so, so we can systematically have a better set of ABI, and keep all
the code gens are first class citizens. (also work out the type system
changes need to be able to correctly use SIMD shuffles, which are
currently inexpressible correctly with GHC's type system. Simd
Shuffles are crucial for interesting levels of SIMD performance!)
the reason I don't want to make the ABI change right now is because
then we'd have to wait until after llvm 3.4 gets released in like 6
months before giving them another breaking change!
 (OR start baking a LLVM into GHC, which is a leap we're not 100% on,
though theres clear good reasons for why! ).
Basically, if we make breaking changes to the ABI now (and thus have
split ABI for llvm 3.4HEAD vs earlier), and then we do fixups or more
breakage for 7.10, then when 7.10 rolls around (perhaps late next
spring or sometime in the summer, perhaps?), the only supported llvm
version for 7.10 would be LLVM HEAD / 3.5 (which won't be released
till some time thereafter)! Unless we go ahead and break the 3.4 ABI
to 7.10 rather than 7.8 abi (whatever that would entai, which would ).
 This is assuming the ~ 7-8 months between major version releases
cycle that LLVM has done of late
additionally, as Johan remarked today on a pending patch of mine,
having operations only work on the llvm backend, and not on the native
code gen is pretty problematical!  see
 http://ghc.haskell.org/trac/ghc/ticket/8256
tl;dr : Unless we're throwing away native code gen backend next month,
we probably want to actually not increase their capability gap /
current ABI incompatibility right before 7.8 release. I am willing to
help explore modernizing the native code gens so that they have parity
with the llvm backends. Additionally, boxing ourselves in a corner
where for 7.10 the only llvm with the right ABI will be llvm 3.5 seems
totally unacceptable from an end users / distribution package managers
standpoint, and a huge support headache for the  community.
I've had to help deal with the support headache of the xcode5 clang +
ghc issues on OS X,  A LOT,  in the past 2 months, I'm not keen on
deliberately creating similar support disasters for myself and others.
that said: I absolutely agree that we should fix up the ABI, have a
clear story for XMM, YMM, and ZMM registers, and if you've been
following trac tickets at all, you'll see theres even a type system
issue in properly handling the SIMD shuffles! i briefly sketch out the
issue in http://ghc.haskell.org/trac/ghc/ticket/8107 (last comment)
that said: i'm open to being convinced i'm wrong, and I absolutely
understand your motivations for wanting it now, but I really believe
that doing so right now will create a number of problems that are
better off evaded to begin with
cheers
-Carter
On Wed, Sep 11, 2013 at 5:49 PM, Geoffrey Mainland
mailto:mainland@cs.drexel.edu> wrote:
Hi Carter,
On 09/06/2013 03:24 PM, Carter Tazio Schonwald wrote:
    > Hey Geoff,
> I'm leary about doing a calling convention change right before
    the ghc
    > release (and I"m happy to elaborate more on the phone some time) 1)
    > I'd rather we test the patches on llvm locally ourselves before
    going
    > upstream 2) doing that AVX change on the calling convention now,
    would
    > make it harder to make a more systematic exploration of calling
    > convention changes post 7.8 release, because we would face either
    > breaking the llvm head/3.4 changes, or having to wait till the next
    > llvm release cycle (3.5?!) to upstream any more systematic
    > changes. (such as adding substantially more SIMD registers to
    the GHC
    > calling convention!)
    >
    > I understand your likely motivation for wanting the calling
    convention
    > landing in the 7.8 release, namely it may eke an easy 2x perf
    boost in
    > your stream fusion libs, i just worry that the change would
    ultimately
    > cut off our ability to do more aggressive experimentation and
    > improvements (eg more simd registers!) for ghc 7.10 over the next
    > year?
    >
    > on an unrelated note: I will spend some time this weekend given you
    > the various simd operations I want / think are valuable. the low
    > hanging fruit would be figuring out a good haskell type /
    analogue of
    > the llvm __builtin_shuffle(a,b,c) primop, because that usually
    should
    > generate decent code. I'll work out the details of this and some
    other
    > examples and send it your way in the next few days
    >
    > -Carter
Currently, on x86-64 we pass floats, doubles, and 128-bit wide SIMD
    vectors in xmm1-xmm6. I propose that we change the calling conventions
    to pass 256-bit wide SIMD vectors in ymm1-ymm6 and 512-bit wide SIMD
    vectors in zmm1-zmm6. I don't know why GHC doesn't use xmm0 or
    xmm7, as
    the Linux C calling convention uses xmm0-xmm7. Simon, perhaps you know
    why? I get that we only needed 6 registers originally, F1-F4, D1-D2),
    but why count from one rather than zero?
On x86-32, we pass floats, double, and all SIMD vectors on the
    stack. I
    propose that we pass 128-bit wide SIMD vectors in xmm0-xmm2, and make
    analogous arrangements for 256- and 512-bit SIMD vectors. We will
    still
    pass floats and doubles on the stack. This matches the Linux x86
    32-bit
    C calling convention.
I think these are fairly conservative changes. I also don't think we
    should be afraid of revising the calling convention for GHC 7.10.
    Surely
    the LLVM folks won't be upset if we send them one set of patches a
    year
    instead of one set of patches every two years.
Geoff