
Hi Simon, I've pushed my simd branch to darcs.haskell.org. Everything has been rebased against HEAD. Simon PJ and I looked over the changes together already, but I wanted to give you (and everyone on ghc-devs) the opportunity to look things over before I merge to HEAD. Simon PJ and I came up with a few questions/notes for you, but hopefully nothing that should delay a merge. * Win32 issues Modern 32-bit x86 *NIX systems align the stack to 16-bytes, but Win32 aligns only to 4-bytes. LLVM does not assume 16-byte stack alignment. Instead, on platforms where 16-byte stack alignment is not guaranteed, it 1) always outputs a function prologue that 2) aligns the stack to a 16-byte boundary with an "and" instructions, and it also 3) disables tail calls. Because LLVM aligns the stack for a function that has SSE register spills, it also generates movaps instructions (aligned SSE moves) for the spills. This makes SSE support on Win32 difficult, and in my opinion not worth worrying about. The alternative is to 1) patch LLVM to disable the stack-alignment code so that we recover the ability to use tail calls and so that ebp scribbled over by the prologue and 2) patch the mangler to rewrite LLVM's movaps (move aligned) instructions to movups (move unaligned) instructions. I have these patches, but they are not included in the simd branch. * How hard would it be to dump ArgRep for PrimRep? It looks straightforward. Is it worth doing? * How hard would it be to track bit width in PrimRep? I recall chatting with you once about adding explicit support for, e.g., 8- and 16-bit Word/Int primops instead of relying on narrowing. Since SIMD vectors need to know the exact bit-width of their elements, I've had to create a PrimElemRep data type in compiler/types/TyCon.lhs, but I'd really like to be able to re-use PrimRep instead. * If we replaced all old-style C-- code, could we get rid of the explicit STG registers completely? Simon PJ suggested that we use real machine registers directly, so, for example, GlobalReg's constructors would have FastString fields instead of Int fields. * Could we add a CmmType field to GlobalReg's constructors? You'll see that I added a new XmmReg constructor to GlobalReg, but because I don't know the type of an XmmReg, I have to bitcast everywhere in the generated LLVM code because LLVM wants to know not just that a value is a 16-byte vector, but that it is, e.g., a 16-byte vector containing 2 64-bit doubles. Having a CmmType attached to a GlobalReg---or pairing a GlobalReg with a CmmType when assigning registers---would let me avoid all these casts. Thanks! Geoff