[GHC] #13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc2 (NCG) | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Testing GHC 8.0.1 against the RC 8.2.0.20170507 I've distilled a smallish test-case from a much larger case in my 'hashabler' library, and validated that the same modifications also make that regression disappear in the real case. It's probably possible to get this smaller but I don't know if I'll have time to work on it more for a while: repro3.hs: {{{#!hs {-# LANGUAGE BangPatterns #-} module Main(main) where import Criterion.Main import qualified Data.Primitive as P import Data.Bits import Data.Word import Control.DeepSeq main :: IO () main = do defaultMain [ env (newByteArr64 5) $ \ ~bs -> bench "ByteArray 5" $ nf (hashTextSip 99) bs , env (newByteArr64 8) $ \ ~bs -> bench "ByteArray 8" $ nf (hashTextSip 99) bs , env (newByteArr64 512) $ \ ~bs -> bench "ByteArray 512" $ nf (hashTextSip 99) bs , env (newByteArr64 1000000) $ \ ~bs -> bench "ByteArray 1000000" $ nf (hashTextSip 99) bs ] instance NFData P.ByteArray where rnf _ = () newByteArr64 n = P.newAlignedPinnedByteArray (8*n) 8 >>= P.unsafeFreezeByteArray sipRound :: Word64 -> Word64 -> Word64 -> Word64 -> (Word64, Word64, Word64, Word64) {-# INLINE sipRound #-} sipRound v0 v1 v2 v3 = (v3 `xor` v0, v0 `xor` v1, v1 `xor` v2, v2 `xor` v3) hashTextSip :: Word64 -> P.ByteArray -> Word64 {-# INLINE hashTextSip #-} hashTextSip h = \ ba -> let !lenWord16 = P.sizeofByteArray ba `unsafeShiftR` 1 !word16sRem = lenWord16 .&. 3 !word16sIx = lenWord16-word16sRem !ixFinal = lenWord16-1 !word16sIxWd = word16sIx `unsafeShiftR` 2 -- `div` 4 hash4Word16sLoop hAcc@(!w0,!w1,!w2,!w3) !ix | ix == word16sIxWd = hashRemainingWord16s hAcc word16sIx | otherwise = let w64Dirty = P.indexByteArray ba ix w64 = clean4xWord16ChunkLE w64Dirty in hash4Word16sLoop (sipRound (w0 `xor` w64) w1 w2 w3) (ix + 1) -- NOTE: Removing this causes regression to disappear as well. hashRemainingWord16s (!w0,!w1,!w2,!w3) !ix | ix > ixFinal = w0 | otherwise = let w16 = P.indexByteArray ba ix in hashRemainingWord16s (sipRound (w0 `xor` (fromIntegral (w16 :: Word16))) w1 w2 w3) (ix+1) in hash4Word16sLoop (h,1,2,3) 0 clean4xWord16ChunkLE :: Word64 -> Word64 {-# INLINE clean4xWord16ChunkLE #-} clean4xWord16ChunkLE w64Dirty = -- NOTE: no regression when just this (8.2 is faster) -- (((byteSwap64 w64Dirty) `unsafeShiftR` 8) .&. 0x00FF00FF00FF00FF) -- ...but this is a big regression: (((byteSwap64 w64Dirty) `unsafeShiftR` 8) .&. 0x00FF00FF00FF00FF) .|. (((byteSwap64 w64Dirty) `unsafeShiftL` 8) .&. 0xFF00FF00FF00FF00) }}} Here are the results of the benchmark above on my machine: On GHC **8.0.1**: {{{ benchmarking ByteArray 5 time 24.70 ns (24.00 ns .. 26.25 ns) 0.987 R² (0.967 R² .. 1.000 R²) mean 24.44 ns (24.13 ns .. 25.80 ns) std dev 1.859 ns (318.3 ps .. 4.227 ns) variance introduced by outliers: 86% (severely inflated) benchmarking ByteArray 8 time 32.66 ns (32.58 ns .. 32.76 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 32.79 ns (32.64 ns .. 33.09 ns) std dev 683.7 ps (365.4 ps .. 1.175 ns) variance introduced by outliers: 31% (moderately inflated) benchmarking ByteArray 512 time 1.428 μs (1.382 μs .. 1.522 μs) 0.986 R² (0.970 R² .. 1.000 R²) mean 1.398 μs (1.384 μs .. 1.454 μs) std dev 91.12 ns (4.475 ns .. 193.9 ns) variance introduced by outliers: 76% (severely inflated) benchmarking ByteArray 1000000 time 2.658 ms (2.653 ms .. 2.663 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.672 ms (2.665 ms .. 2.691 ms) std dev 35.00 μs (10.88 μs .. 59.58 μs) }}} And on **GHC 8.2** RC: {{{ benchmarking ByteArray 5 time 23.78 ns (23.68 ns .. 23.88 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 23.83 ns (23.76 ns .. 23.95 ns) std dev 298.8 ps (183.2 ps .. 482.5 ps) variance introduced by outliers: 14% (moderately inflated) benchmarking ByteArray 8 time 35.81 ns (35.44 ns .. 36.27 ns) 0.999 R² (0.998 R² .. 1.000 R²) mean 35.56 ns (35.45 ns .. 35.94 ns) std dev 596.8 ps (134.5 ps .. 1.184 ns) variance introduced by outliers: 22% (moderately inflated) benchmarking ByteArray 512 time 1.706 μs (1.698 μs .. 1.716 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.701 μs (1.698 μs .. 1.707 μs) std dev 13.27 ns (5.825 ns .. 24.41 ns) benchmarking ByteArray 1000000 time 3.322 ms (3.284 ms .. 3.377 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 3.296 ms (3.287 ms .. 3.332 ms) std dev 44.62 μs (20.55 μs .. 87.29 μs) }}} Looking at the core wasn't fruitful, but I think dumping the asm shows that this is a case of bad (or worse) register allocation. I've attached two screenshots showing the instructions added (in blue), when moving from the one-line `clean4xWord16ChunkLE` to the two-line version, for both 8.0 and 8.2 (there wasn't anything in the diff besides instances of this change). It looks in the 8.2 version like we've decided we're out of registers and need to use the stack. In my real code I'm seeing 35% regression on very long Text, as well as 21% regression on very long ByteString; the latter implementation is similarly structured to `hashTextSip`, but doesn't call `clean4xWord16ChunkLE` but does do a byteswap. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jberryman): * Attachment "repro3.hs" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jberryman): * Attachment "8.0.1.regressing_and_non_regressing_asm_diff.png" added. Adding the second line (which causes regression in 8.2) on 8.0.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jberryman): * Attachment "8.2.rc.regressing_and_non_regressing_asm_diff.png" added. Adding the second line (which causes regression in 8.2) on 8.2.1, illustrating register spilling -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): Thanks for the nice bug report! I'm attaching a version of the program with no external dependencies, which will be useful as I track down which commit caused this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * Attachment "repro3_reduced.hs" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * keywords: => JoinPoints Comment: Commit 8d5cf8bf584fd4849917c29d82dcf46ee75dd035 (Join points) is the culprit. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): Looking at the `-ddump-simpl` output, there are some noticeable differences in the Core emitted by 8.0.2 and 8.2.1-rc2. From 8.0.2, we have: {{{ -- RHS size: {terms: 137, types: 72, coercions: 5} main1 :: State# RealWorld -> (# State# RealWorld, () #) main1 = \ (s :: State# RealWorld) -> case newAlignedPinnedByteArray# 8000000# 8# (s `cast` ...) of _ { (# ipv, ipv1 #) -> case unsafeFreezeByteArray# ipv1 ipv of _ { (# ipv2, ipv3 #) -> (# ipv2 `cast` ..., let { ipv4 :: Int# ipv4 = uncheckedIShiftRA# (sizeofByteArray# ipv3) 1# } in let { ipv5 :: Int# ipv5 = -# ipv4 (andI# ipv4 3#) } in let { ipv6 :: Int# ipv6 = uncheckedIShiftRA# ipv5 2# } in let { ipv7 :: Int# ipv7 = -# ipv4 1# } in letrec { $whashRemainingWord16s :: Word# -> Word# -> Word# -> Word# -> Int# -> Word# $whashRemainingWord16s = \ (ww :: Word#) (ww1 :: Word#) (ww2 :: Word#) (ww3 :: Word#) (ww4 :: Int#) -> case tagToEnum# (># ww4 ipv7) of _ { False -> case indexWord16Array# ipv3 ww4 of wild1 { __DEFAULT -> let { v0 :: Word# v0 = xor# ww wild1 } in $whashRemainingWord16s (xor# ww3 v0) (xor# v0 ww1) (xor# ww1 ww2) (xor# ww2 ww3) (+# ww4 1#) }; True -> ww }; } in letrec { $whash4Word16sLoop :: Word# -> Word# -> Word# -> Word# -> Int# -> Word# $whash4Word16sLoop = \ (ww :: Word#) (ww1 :: Word#) (ww2 :: Word#) (ww3 :: Word#) (ww4 :: Int#) -> case tagToEnum# (==# ww4 ipv6) of _ { False -> case indexWord64Array# ipv3 ww4 of wild1 { __DEFAULT -> let { v0 :: Word# v0 = xor# ww (or# (and# (uncheckedShiftRL# (byteSwap# wild1) 8#) 71777214294589695##) (and# (uncheckedShiftL# (byteSwap# wild1) 8#) 18374966859414961920##)) } in $whash4Word16sLoop (xor# ww3 v0) (xor# v0 ww1) (xor# ww1 ww2) (xor# ww2 ww3) (+# ww4 1#) }; True -> $whashRemainingWord16s ww ww1 ww2 ww3 ipv5 }; } in case $whash4Word16sLoop 99## 1## 2## 3## 0# of _ { __DEFAULT -> () } #) } } -- RHS size: {terms: 1, types: 0, coercions: 3} main :: IO () main = main1 `cast` ... }}} And from 8.2.1-rc2, we have: {{{ -- RHS size: {terms: 134, types: 77, coercions: 72, joins: 2/8} main1 :: State# RealWorld -> (# State# RealWorld, () #) main1 = \ (s :: State# RealWorld) -> case newAlignedPinnedByteArray# 8000000# 8# (s `cast` Co:41) of { (# ipv, ipv1 #) -> case unsafeFreezeByteArray# ipv1 ipv of { (# ipv2, ipv3 #) -> (# ipv2 `cast` Co:31, let { ipv4 :: Int# ipv4 = uncheckedIShiftRA# (sizeofByteArray# ipv3) 1# } in let { ixFinal :: Int# ixFinal = -# ipv4 1# } in let { word16sIx :: Int# word16sIx = -# ipv4 (andI# ipv4 3#) } in let { word16sIxWd :: Int# word16sIxWd = uncheckedIShiftRA# word16sIx 2# } in joinrec { $whashRemainingWord16s :: Word# -> Word# -> Word# -> Word# -> Int# -> () $whashRemainingWord16s (ww :: Word#) (ww1 :: Word#) (ww2 :: Word#) (ww3 :: Word#) (ww4 :: Int#) = case tagToEnum# (># ww4 ixFinal) of { False -> case indexWord16Array# ipv3 ww4 of wild1 { __DEFAULT -> let { v0 :: Word# v0 = xor# ww wild1 } in jump $whashRemainingWord16s (xor# ww3 v0) (xor# v0 ww1) (xor# ww1 ww2) (xor# ww2 ww3) (+# ww4 1#) }; True -> () }; } in joinrec { $whash4Word16sLoop :: Word# -> Word# -> Word# -> Word# -> Int# -> () $whash4Word16sLoop (ww :: Word#) (ww1 :: Word#) (ww2 :: Word#) (ww3 :: Word#) (ww4 :: Int#) = case tagToEnum# (==# ww4 word16sIxWd) of { False -> case indexWord64Array# ipv3 ww4 of wild1 { __DEFAULT -> let { v0 :: Word# v0 = xor# ww (or# (and# (uncheckedShiftRL# (byteSwap# wild1) 8#) 71777214294589695##) (and# (uncheckedShiftL# (byteSwap# wild1) 8#) 18374966859414961920##)) } in jump $whash4Word16sLoop (xor# ww3 v0) (xor# v0 ww1) (xor# ww1 ww2) (xor# ww2 ww3) (+# ww4 1#) }; True -> jump $whashRemainingWord16s ww ww1 ww2 ww3 word16sIx }; } in jump $whash4Word16sLoop 99## 1## 2## 3## 0# #) } } -- RHS size: {terms: 1, types: 0, coercions: 3, joins: 0/0} main :: IO () main = main1 `cast` Co:3 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jberryman): Yeah, I quickly gave up trying to core or downstream output between the two. And I'm aware that we should expect register allocation to be fiddly: the algorithm is simple and suboptimal, so it's likely that from very different core we'd get randomly better or worse generated code, especially if we're pushing our use of registers (I'm not sure if this is really the case here). Still I thought I should report because: - this represents a real regression in my code, and perhaps something like this should be included in ghc benchmarks to motivate improvements to register allocation, and - it might be that something else is going on and we're getting register spilling when we really shouldn't In any case I hope reporting this is not unproductive. Here's a bit more information: I found that the following change (which is more sensible source in any case) results in significantly faster code in both cases and starts to narrow but does not eliminate the regression (we are still using stack in the 8.2 version). A test case with this version is probably better to work with: {{{#!hs clean4xWord16ChunkLE :: Word64 -> Word64 {-# INLINE clean4xWord16ChunkLE #-} clean4xWord16ChunkLE w64Dirty = -- This improves things significantly (and is an improvement in 8.0.1), but -- still regresses: -- For "ByteArray 1000000": -- 8.0.1: 2.663 ms - 2.261 ms -- 8.2 rc: 3.371 ms - 2.728 ms let !w64 = byteSwap64 w64Dirty in ((w64 `unsafeShiftR` 8) .&. 0x00FF00FF00FF00FF) .|. ((w64 `unsafeShiftL` 8) .&. 0xFF00FF00FF00FF00) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * cc: kavon@… (added) Comment: I'm puzzled. The changes in comment:3 change `letrec` into `joinrec`, which ought to be a straight win. I'm adding Kavon in cc because he is interested in this back-end stuff. More insight into where the regression comes from would be helpful, if anyone has time to dig in. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * priority: normal => high * milestone: => 8.2.2 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Hopefully Phab:D2309 will help improve register allocation in cases like these. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): Did you mean Phab:D2903? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Indeed, good catch, RyanGlScott. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.4.1 Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.2.2 => 8.4.1 Comment: Unfortunately I don't think anything will happen on this front for 8.2.2; however, it's likely that Phab:D2903 will make it for 8.4. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.4.1 Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by michalt): * cc: michalt (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array -------------------------------------+------------------------------------- Reporter: jberryman | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler (NCG) | Version: 8.2.1-rc2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.4.1 => 8.6.1 Comment: Sadly this didn't quite make 8.4. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13763#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC