
#14644: Improve cmm/assembly for pattern matches with two constants. -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: AndreasK Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 (CodeGen) | Keywords: Codegen, CMM, Resolution: | Patterns, Pattern Matching Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4294 Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * owner: (none) => AndreasK * differential: => Phab:D4294 Comment: I ran a simple Benchmark and the results seem almost too good with >10% gains in almost all cases. If it really is that good then, well, I guess good for us. Assuming I didn't introduce an error somewhere I assume the combination of smaller code and less instructions on some paths is just better in general.
If we e.g. assume that all Ints happen equally often in the example above, it would be best to check for <1 and >7 first, so we would get roughly 1.5 comparisons on average. Depending on the number of registers available, you can even get away with 1 subtraction and 1 unsigned comparison for this range check, a classic C hack (ab)using wrap-around for unsigned ints.
I thought about this a bit. While true it also means the first comparison has a hard time with prediction if the numbers are a random distribution over the range. So two 99.9% correctly predicted comparisons might still be better even without considering code size. `range check, jg, je, cmp, je, j <default>` could still be better. But not sure if I will look into that and it might warrant it's own ticket as GHC also uses this for range checks on jump tables I think. Code used for the test: {{{ f_test :: Int -> Int f_test 111 = 1111 f_test 222 = 2222 f_test _ = -1 run = print . sum . map f_test --main = print . sum . map f_test $ ([(-1500000000::Int) .. 0]) main = do args <- getArgs :: IO [String] if null args then putStrLn "Usage: pos|neg|both" else case (head args) of "pos" -> run [0 .. (1500000000::Int)] "neg" -> run [-1500000000::Int .. 0] "both" -> run [-750000000::Int .. 750000000::Int] "negrep" -> run . concat . replicate 100000000 $ [-311, -322, -333, -444] "posrep" -> run . concat . replicate 100000000 $ [311, 322, 333, 444] "unpre" -> run . concat . replicate 100000000 $ [111, 322, -333, 222] }}} I get these results for the current approach (range) and mine (if) on an 6700K === With inlining: {{{ benchmarking execute: range pos mean 1.196 s (1.196 s .. 1.198 s) benchmarking execute: if pos mean 806.0 ms (803.5 ms .. 807.3 ms) benchmarking execute: range neg mean 2.384 s (2.383 s .. 2.385 s) benchmarking execute: if neg mean 1.841 s (1.841 s .. 1.842 s) benchmarking execute: range negrep mean 840.1 ms (838.1 ms .. 841.3 ms) benchmarking execute: if negrep mean 728.6 ms (728.3 ms .. 728.8 ms) benchmarking execute: range unpre mean 852.2 ms (851.1 ms .. 853.1 ms) benchmarking execute: if unpre mean 789.7 ms (789.3 ms .. 789.9 ms) }}} === With inlining disabled on f_test {{{ benchmarking execute: range pos mean 2.385 s (2.383 s .. 2.386 s) benchmarking execute: if pos mean 2.383 s (2.382 s .. 2.384 s) benchmarking execute: range neg mean 2.845 s (2.839 s .. 2.848 s) benchmarking execute: if neg mean 2.047 s (2.041 s .. 2.053 s) benchmarking execute: range negrep mean 1.204 s (1.201 s .. 1.205 s) benchmarking execute: if negrep mean 829.4 ms (828.4 ms .. 830.1 ms) benchmarking execute: range unpre mean 1.165 s (1.164 s .. 1.166 s) benchmarking execute: if unpre mean 1.118 s (1.117 s .. 1.119 s) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14644#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler