
#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jberryman): Yeah, I quickly gave up trying to core or downstream output between the two. And I'm aware that we should expect register allocation to be fiddly: the algorithm is simple and suboptimal, so it's likely that from very different core we'd get randomly better or worse generated code, especially if we're pushing our use of registers (I'm not sure if this is really the case here). Still I thought I should report because: - this represents a real regression in my code, and perhaps something like this should be included in ghc benchmarks to motivate improvements to register allocation, and - it might be that something else is going on and we're getting register spilling when we really shouldn't In any case I hope reporting this is not unproductive. Here's a bit more information: I found that the following change (which is more sensible source in any case) results in significantly faster code in both cases and starts to narrow but does not eliminate the regression (we are still using stack in the 8.2 version). A test case with this version is probably better to work with: {{{#!hs clean4xWord16ChunkLE :: Word64 -> Word64 {-# INLINE clean4xWord16ChunkLE #-} clean4xWord16ChunkLE w64Dirty = -- This improves things significantly (and is an improvement in 8.0.1), but -- still regresses: -- For "ByteArray 1000000": -- 8.0.1: 2.663 ms - 2.261 ms -- 8.2 rc: 3.371 ms - 2.728 ms let !w64 = byteSwap64 w64Dirty in ((w64 `unsafeShiftR` 8) .&. 0x00FF00FF00FF00FF) .|. ((w64 `unsafeShiftL` 8) .&. 0xFF00FF00FF00FF00) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler