
Moritz Angermann
I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor AVX512; that looks like only a patch away. However we try to optimize our register, such that we can pass up to six doubles or six floats or any combination of both if needed in registers, without having to allocate them on the stack, by assuming overlapping registers (See Note [Overlapping global registers]).
And as such a full function signature in LLVM would as opposed to one that’s based on the “live” registers as we have right now, would consist of 12 float/double registers, and LLVM only maps 6. My current idea is to, pass only the explicit F1,D1,…,F3,D3 and try to disable the register overlapping for LLVM. This would probably force more floating values to be stack allocated rather than passed via registers, but would likely guarantee that the registers match up. The other option I can think of is to define some viertual generic floating registers in the llvm code gen: V1,…,V6 and then perform something like
F1 <- V1 as float D1 <- V1 as double
in the body of the function, while trying to use the `live` information at the call site to decide which of F1 or D1 to pass as V1.
Arguably the fundamental problem here is the assumption that all STG entry-points have the same machine-level calling convention. As you point out, our calling conventions in fact change due to things like register overlap. Ideally the LLVM we produce would reflect this. One way to make this happen would be for C-- call nodes to carry information about the calling convention of the target (e.g. how many arguments of each type the function expects; in the same way identifiers in Core carry their type). Unfortunately a brief look at the code generator suggests that this may require a fair amount of plumbing. It's important to note though that this overlap problem is something that will need to be addressed eventually if we are are to have proper SIMD support (due to overlap between XMM, YMM, and ZMM). Cheers, - Ben