
#13629: sqrt should use machine instruction on x86_64 -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: 8.4.1 Component: Compiler (NCG) | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime | Test Case: performance bug | numeric/num009 Blocked By: | Blocking: Related Tickets: #13570 | Differential Rev(s): Phab:D3508 Wiki Page: | -------------------------------------+------------------------------------- Comment (by kavon): Replying to [comment:4 bgamari]:
It's made in the native code generator, `genCCall` in `compiler/nativeGen/X86/CodeGen.hs`. While we use the `fsin` instruction on i386, we don't on x86_64 (and i386 with `-msse2`).
If there is already x87 FPU instruction support in the NCG for x86-32, it might be profitable to reuse that support for x86-64 to speed up trig functions, etc. The simplest way I see it is to expand the foreign call into an instruction sequence that moves the float from XMM registers to the x87 stack, computes the value, and moves it back to XMM registers. This way we no longer have a C call in a potentially bad place. It's worth comparing x87 on modern processors against the assembly routine backing the C function first. It seems platforms like Skylake the x87 `fsin` takes 50-120 cycles [1], but I'm not sure about the library versions. If they're roughly equivalent, there's likely a benefit to eliding the C call. [1] http://www.agner.org/optimize/instruction_tables.pdf (Page 223 for Skylake x87) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13629#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler