
Can you provide an example of the kind of ABI change you might want for 7.10? Is it mainly using more registers to pass arguments? We're already using 6 *mm* registers to pass arguments on x86_64. I don't know for sure, but I would be very surprised if there is code out there that would benefit greatly from passing more than 6 Float/Double/SIMD vector arguments in registers. Without understanding the ABI design space you have in mind, I can't comment on how changing the ABI now would or would not make future exploration more difficult. I don't see why we should limit ourselves by insisting that the gap between what the LLVM back-end and the native back-end not grow further. If we want SIMD, the gap is already quite large. Yes it would be nice to have feature parity, but there are only so many man-hours available, and we want to invest them wisely. The SIMD primops already do not work on the native codegen; the user gets an error telling them to use the LLVM back-end if they use the SIMD primops with the native codegen. I was not suggesting that we require LLVM 3.4 or later for this or any future version of GHC. Instead, the ABI would change based on the version of LLVM used. I think that is unavoidable at this point and not a huge deal as it would only affect SIMD code. All this said, I'm not going to push. Changing the ABI just creates more work for me. I'm very motivated to get the rest of the SIMD patches into HEAD before I present our SIMD paper at ICFP in a few weeks. However, a year from now my priorities will likely be very different, so the ball will be entirely in your (or someone else's, just not my!) court. Geoff On 09/11/2013 06:26 PM, Carter Schonwald wrote:
hey all,
first let me preface by saying I am in favor of breaking and updating/modernizing the GHC ABI.
I just think that for a number of reasons, it doesn't make sense to do it for the 7.8 release, but rather start work on it in another month or so, so we can systematically have a better set of ABI, and keep all the code gens are first class citizens. (also work out the type system changes need to be able to correctly use SIMD shuffles, which are currently inexpressible correctly with GHC's type system. Simd Shuffles are crucial for interesting levels of SIMD performance!)
the reason I don't want to make the ABI change right now is because then we'd have to wait until after llvm 3.4 gets released in like 6 months before giving them another breaking change! (OR start baking a LLVM into GHC, which is a leap we're not 100% on, though theres clear good reasons for why! ).
Basically, if we make breaking changes to the ABI now (and thus have split ABI for llvm 3.4HEAD vs earlier), and then we do fixups or more breakage for 7.10, then when 7.10 rolls around (perhaps late next spring or sometime in the summer, perhaps?), the only supported llvm version for 7.10 would be LLVM HEAD / 3.5 (which won't be released till some time thereafter)! Unless we go ahead and break the 3.4 ABI to 7.10 rather than 7.8 abi (whatever that would entai, which would ). This is assuming the ~ 7-8 months between major version releases cycle that LLVM has done of late
additionally, as Johan remarked today on a pending patch of mine, having operations only work on the llvm backend, and not on the native code gen is pretty problematical! see http://ghc.haskell.org/trac/ghc/ticket/8256
tl;dr : Unless we're throwing away native code gen backend next month, we probably want to actually not increase their capability gap / current ABI incompatibility right before 7.8 release. I am willing to help explore modernizing the native code gens so that they have parity with the llvm backends. Additionally, boxing ourselves in a corner where for 7.10 the only llvm with the right ABI will be llvm 3.5 seems totally unacceptable from an end users / distribution package managers standpoint, and a huge support headache for the community.
I've had to help deal with the support headache of the xcode5 clang + ghc issues on OS X, A LOT, in the past 2 months, I'm not keen on deliberately creating similar support disasters for myself and others.
that said: I absolutely agree that we should fix up the ABI, have a clear story for XMM, YMM, and ZMM registers, and if you've been following trac tickets at all, you'll see theres even a type system issue in properly handling the SIMD shuffles! i briefly sketch out the issue in http://ghc.haskell.org/trac/ghc/ticket/8107 (last comment)
that said: i'm open to being convinced i'm wrong, and I absolutely understand your motivations for wanting it now, but I really believe that doing so right now will create a number of problems that are better off evaded to begin with
cheers -Carter
On Wed, Sep 11, 2013 at 5:49 PM, Geoffrey Mainland
mailto:mainland@cs.drexel.edu> wrote: Hi Carter,
On 09/06/2013 03:24 PM, Carter Tazio Schonwald wrote: > Hey Geoff,
> I'm leary about doing a calling convention change right before the ghc > release (and I"m happy to elaborate more on the phone some time) 1) > I'd rather we test the patches on llvm locally ourselves before going > upstream 2) doing that AVX change on the calling convention now, would > make it harder to make a more systematic exploration of calling > convention changes post 7.8 release, because we would face either > breaking the llvm head/3.4 changes, or having to wait till the next > llvm release cycle (3.5?!) to upstream any more systematic > changes. (such as adding substantially more SIMD registers to the GHC > calling convention!) > > I understand your likely motivation for wanting the calling convention > landing in the 7.8 release, namely it may eke an easy 2x perf boost in > your stream fusion libs, i just worry that the change would ultimately > cut off our ability to do more aggressive experimentation and > improvements (eg more simd registers!) for ghc 7.10 over the next > year? > > on an unrelated note: I will spend some time this weekend given you > the various simd operations I want / think are valuable. the low > hanging fruit would be figuring out a good haskell type / analogue of > the llvm __builtin_shuffle(a,b,c) primop, because that usually should > generate decent code. I'll work out the details of this and some other > examples and send it your way in the next few days > > -Carter
Currently, on x86-64 we pass floats, doubles, and 128-bit wide SIMD vectors in xmm1-xmm6. I propose that we change the calling conventions to pass 256-bit wide SIMD vectors in ymm1-ymm6 and 512-bit wide SIMD vectors in zmm1-zmm6. I don't know why GHC doesn't use xmm0 or xmm7, as the Linux C calling convention uses xmm0-xmm7. Simon, perhaps you know why? I get that we only needed 6 registers originally, F1-F4, D1-D2), but why count from one rather than zero?
On x86-32, we pass floats, double, and all SIMD vectors on the stack. I propose that we pass 128-bit wide SIMD vectors in xmm0-xmm2, and make analogous arrangements for 256- and 512-bit SIMD vectors. We will still pass floats and doubles on the stack. This matches the Linux x86 32-bit C calling convention.
I think these are fairly conservative changes. I also don't think we should be afraid of revising the calling convention for GHC 7.10. Surely the LLVM folks won't be upset if we send them one set of patches a year instead of one set of patches every two years.
Geoff