[GHC] #12736: Calling a complex Haskell function (obtained via FFI wrapper function) from MSVC 64-bit C code (passed in as FunPtr) can leave SSE2 registers in the XMM6-XMM15 range modified

#12736: Calling a complex Haskell function (obtained via FFI wrapper function) from MSVC 64-bit C code (passed in as FunPtr) can leave SSE2 registers in the XMM6-XMM15 range modified -------------------------------------+------------------------------------- Reporter: bavism | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 (FFI) | Keywords: | Operating System: Windows ffi,registers,sse2,clobber,xmm | Architecture: x86_64 | Type of failure: Incorrect result (amd64) | at runtime Test Case: | Blocked By: https://github.com/bavis-m/raycast | Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- According to the [https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx MSDN], in the Microsoft x64 architecture function calls must preserve the SSE2 registers in the range XMM6-XMM15. The Haskell FFI can produce a function pointer via dynamic wrapper that, when called from MSVC x64 C code, does not preserve these registers, causing further floating-point operations in the C code to fail. I can reproduce this error in [https://github.com/bavis-m/raycast this project], which is a DOOM-style raycasting engine written in Haskell, that imports a C DLL with glue for rendering and window management. The Haskell executable generates a FunPtr to a frame update function using the dynamic import mechanism, and passes this to a long-lived C function that runs the update loop. Any time this update function is called from the C loop, subsequent floating point operations produce incorrect results (in this case, the next operations compute a view matrix for the OpenGL window). The output on every frame showing the view matrix should be: {{{ viewM: 0.003125 0.000000 0.000000 -1.000000 0.000000 0.004167 0.000000 -1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 }}} Running the raycaster with the Release version of the DLL causes the value of this matrix to be corrupted. There is a patch provided (stub.patch in the root folder) that turns the Haskell update function into an empty stub. This causes the program to work. When stepping through the assembly code with this patch applied, I can see in the function prologue where the XMM registers are saved. Without the patch, these registers are not saved. Running the Debug version does not show this error; the register allocation must be different. I have been attempting to create a much simpler test case to reveal this code-generation issue, however it has been difficult. Even seemingly trivial changes can cause the bug to not show up, it is clearly dependent on the register allocation used internally to produce the assembly code. Instructions for building the project are in the readme. (You will need the Haskell Stack Tool, and Visual Studio 15). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12736 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12736: Calling a complex Haskell function (obtained via FFI wrapper function) from MSVC 64-bit C code (passed in as FunPtr) can leave SSE2 registers in the XMM6-XMM15 range modified -------------------------------------+------------------------------------- Reporter: bavism | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (FFI) | Version: 7.10.3 Resolution: | Keywords: | ffi,registers,sse2,clobber,xmm Operating System: Windows | Architecture: x86_64 | (amd64) Type of failure: Incorrect result | Test Case: at runtime | https://github.com/bavis-m/raycast Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by carter): Reading the doc {{{ XMM6:XMM15, YMM6:YMM15 Nonvolatile (XMM), Volatile (upper half of YMM) Must be preserved as needed by callee. YMM registers must be preserved as needed by caller. }}} It looks like if you have the callee clobber ymm6-16 you can get the caller to handle the save / restore Alternatively, a simple wrapper around the Haskell functions could explicitly read xmm6-16 before entering the Haskell call and set the values after return. So that should at least fix it with a simple read call and set sequence on the c side That said, sounds like this indeed a bug, though the xmm vs ymm caller vs callee stuff is kinda gross and definitely a bug in the wrappers/stubs generated for the windows platform. Please share if the near term work around helps. I realize it adds an extra indirection in the Haskell call, but probably simplest way to fix it this week? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12736#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12736: Calling a complex Haskell function (obtained via FFI wrapper function) from MSVC 64-bit C code (passed in as FunPtr) can leave SSE2 registers in the XMM6-XMM15 range modified -------------------------------------+------------------------------------- Reporter: bavism | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (FFI) | Version: 7.10.3 Resolution: | Keywords: | ffi,registers,sse2,clobber,xmm Operating System: Windows | Architecture: x86_64 | (amd64) Type of failure: Incorrect result | Test Case: at runtime | https://github.com/bavis-m/raycast Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by carter): To clarify: you could also compile your c code to assume it's in an avx/avx2 microarchtiecture to see if that works. But if you wanna support sse2 targets a teeny bit of xmm read write code might help -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12736#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12736: Calling a complex Haskell function (obtained via FFI wrapper function) from MSVC 64-bit C code (passed in as FunPtr) can leave SSE2 registers in the XMM6-XMM15 range modified -------------------------------------+------------------------------------- Reporter: bavism | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (FFI) | Version: 7.10.3 Resolution: | Keywords: | ffi,registers,sse2,clobber,xmm Operating System: Windows | Architecture: x86_64 | (amd64) Type of failure: Incorrect result | Test Case: at runtime | https://github.com/bavis-m/raycast Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bavism): Replying to [comment:1 carter]:
Reading the doc {{{ XMM6:XMM15, YMM6:YMM15 Nonvolatile (XMM), Volatile (upper half of YMM) Must be preserved as needed by callee. YMM registers must be preserved as needed by caller.
}}}
It looks like if you have the callee clobber ymm6-16 you can get the caller to handle the save / restore Alternatively, a simple wrapper around the Haskell functions could explicitly read xmm6-16 before entering the Haskell call and set the values after return. So that should at least fix it with a simple read call and set sequence on the c side
That said, sounds like this indeed a bug, though the xmm vs ymm caller vs callee stuff is kinda gross and definitely a bug in the wrappers/stubs generated for the windows platform.
Please share if the near term work around helps. I realize it adds an extra indirection in the Haskell call, but probably simplest way to fix it this week?
Turns out this was much more difficult than I initially anticipated, as MSVC does not allow inline assembly in x64 projects :(. I have pushed up a workaround to a new branch in that project, {{{fixasm}}}, that directs Visual Studio to build a new file, stub.asm, which exposes a stub function for calling a Haskell function pointer. The stub function saves XMM6-XMM15. This causes the Haskell function to not clobber the registers, and everything works correctly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12736#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12736: Calling a complex Haskell function (obtained via FFI wrapper function) from MSVC 64-bit C code (passed in as FunPtr) can leave SSE2 registers in the XMM6-XMM15 range modified -------------------------------------+------------------------------------- Reporter: bavism | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler (FFI) | Version: 7.10.3 Resolution: fixed | Keywords: | ffi,registers,sse2,clobber,xmm Operating System: Windows | Architecture: x86_64 | (amd64) Type of failure: Incorrect result | Test Case: at runtime | https://github.com/bavis-m/raycast Blocked By: | Blocking: Related Tickets: #14619 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * status: new => closed * resolution: => fixed * related: => #14619 Comment: Fixed with #14619. GHC did not respect callee saved XMM registers on windows. Leading to them being clobbered. I do not have VS2015 to confirm the repro case is fixed. But it's exactly what I would expect to happen without the fix so I'm closing this as fixed. Feel free to reopen if the issue persists. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12736#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC