
#12798: LLVM seeming to over optimize, producing inefficient assembly code... -------------------------------------+------------------------------------- Reporter: GordonBGood | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler (LLVM) | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AlexET): On current HEAD with llvm 3.9.0, the follwing code {{{ import Data.Word import Data.Bits import Data.Array.ST (runSTUArray) import Data.Array.Base import Control.Monad.ST numLOOPS = 10000 :: Int twos :: UArray Int Word32 twos = listArray (0, 31) [1 `shiftL` i | i <- [0 .. 31]] soe :: () -> [Word32] soe() = 2 : [fromIntegral i * 2 + 3 | (i, False) <- assocs bufb] where bufb = runSTUArray $ do let bfLmt = (256 * 1024) `div` 2 - 1 -- to 2^18 + 2 is 128 KBits - 1 = 16 KBytes cmpstsb <- newArray (0, bfLmt) False :: ST s (STUArray s Int Bool) cmpstsw <- (castSTUArray :: STUArray s Int Bool -> ST s (STUArray s Int Word32)) cmpstsb return $! twos -- force evaluation of twos outside the loop. let loop n = -- cull a number of times to test timing if n <= 0 then return cmpstsb else let cullp i = let p = i + i + 3 in let s = (p * p - 3) `div` 2 in if s > bfLmt then loop (n - 1) else do isCmpst <- unsafeRead cmpstsb i if isCmpst then cullp (i + 1) else -- is Prime let cull j = -- very tight inner loop where all the time is spent if j > bfLmt then cullp (i + 1) else do let sh = unsafeAt twos (j .&. 31) -- (1 `shiftL` (j .&. 31))) let w = j `shiftR` 5 ov <- unsafeRead cmpstsw w unsafeWrite cmpstsw w (ov .|. sh) cull (j + p) in cull s in cullp 0 loop numLOOPS main = print $ length $ soe() }}} Gives the inner loop, which is almost the same as rust. {{{ .LBB28_7: movq %rcx, %rdx sarq $5, %rdx movl %ecx, %edi andl $31, %edi movl 16(%r9,%rdi,4), %edi orl %edi, 16(%rax,%rdx,4) addq %rsi, %rcx .LBB28_5: cmpq $131071, %rcx jle .LBB28_7 }}} The CMM and initial llvm code is the same as your code for 8.0.1, so it seems the difference is due to the fact that rust ships with its own more recent llvm than ghc 8.0.1 supports. The difference in the code between my version and your original version is that we force `twos` early which is needed to prevent the evaluation of that within the loop, an optimisation which seems to have been missed by HEAD. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12798#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler