Re: [GHC] #14644: Improve cmm/assembly for pattern matches with two constants.

9 Jan 2018

      #14644: Improve cmm/assembly for pattern matches with two constants.
-------------------------------------+-------------------------------------
        Reporter:  AndreasK          |                Owner:  AndreasK
            Type:  task              |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
  (CodeGen)                          |             Keywords:  Codegen, CMM,
      Resolution:                    |  Patterns, Pattern Matching
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D4294
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by AndreasK):

 * owner:  (none) => AndreasK
 * differential:   => Phab:D4294

Comment:

 I ran a simple Benchmark and the results seem almost too good with >10%
 gains in almost all cases.

 If it really is that good then, well, I guess good for us.

 Assuming I didn't introduce an error somewhere I assume the combination of
 smaller code and less instructions on some paths is just better in
 general.
...
If we e.g. assume that all Ints happen equally often in the example
 above, it would be best to check for <1 and >7 first, so we would get
 roughly 1.5 comparisons on average. Depending on the number of registers
 available, you can even get away with 1 subtraction and 1 unsigned
 comparison for this range check, a classic C hack (ab)using wrap-around
 for unsigned ints.
I thought about this a bit. While true it also means the first comparison
 has a hard time with prediction if the numbers are a random distribution
 over the range. So two 99.9% correctly predicted comparisons might still
 be better even without considering code size.

 `range check, jg, je, cmp, je, j <default>` could still be better. But not
 sure if I will look into that and it might warrant it's own ticket as GHC
 also uses this for range checks on jump tables I think.

 Code used for the test:
 {{{
 f_test :: Int -> Int
 f_test  111 = 1111
 f_test  222 = 2222
 f_test  _ = -1

 run = print . sum . map f_test

 --main = print . sum . map f_test $ ([(-1500000000::Int) .. 0])
 main = do
     args <- getArgs :: IO [String]
     if null args
         then putStrLn "Usage: pos|neg|both"
         else case (head args) of
             "pos" -> run [0 .. (1500000000::Int)]
             "neg" -> run [-1500000000::Int .. 0]
             "both" -> run [-750000000::Int .. 750000000::Int]
             "negrep" -> run . concat . replicate 100000000 $ [-311, -322,
 -333, -444]
             "posrep" -> run . concat . replicate 100000000 $ [311, 322,
 333, 444]
             "unpre" -> run . concat . replicate 100000000 $ [111, 322,
 -333, 222]
 }}}

 I get these results for the current approach (range) and mine (if) on an
 6700K

 === With inlining:
 {{{
 benchmarking execute: range pos
 mean                 1.196 s    (1.196 s .. 1.198 s)
 benchmarking execute: if pos
 mean                 806.0 ms   (803.5 ms .. 807.3 ms)

 benchmarking execute: range neg
 mean                 2.384 s    (2.383 s .. 2.385 s)
 benchmarking execute: if neg
 mean                 1.841 s    (1.841 s .. 1.842 s)

 benchmarking execute: range negrep
 mean                 840.1 ms   (838.1 ms .. 841.3 ms)
 benchmarking execute: if negrep
 mean                 728.6 ms   (728.3 ms .. 728.8 ms)

 benchmarking execute: range unpre
 mean                 852.2 ms   (851.1 ms .. 853.1 ms)
 benchmarking execute: if unpre
 mean                 789.7 ms   (789.3 ms .. 789.9 ms)
 }}}

 === With inlining disabled on f_test
 {{{
 benchmarking execute: range pos
 mean                 2.385 s    (2.383 s .. 2.386 s)
 benchmarking execute: if pos
 mean                 2.383 s    (2.382 s .. 2.384 s)

 benchmarking execute: range neg
 mean                 2.845 s    (2.839 s .. 2.848 s)
 benchmarking execute: if neg
 mean                 2.047 s    (2.041 s .. 2.053 s)

 benchmarking execute: range negrep
 mean                 1.204 s    (1.201 s .. 1.205 s)
 benchmarking execute: if negrep
 mean                 829.4 ms   (828.4 ms .. 830.1 ms)

 benchmarking execute: range unpre
 mean                 1.165 s    (1.164 s .. 1.166 s)
 benchmarking execute: if unpre
 mean                 1.118 s    (1.117 s .. 1.119 s)
 }}}

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14644#comment:5
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler