Re: [GHC] #14980: Runtime performance regression with binary operations on vectors

27 Jun 2018

      #14980: Runtime performance regression with binary operations on vectors
-------------------------------------+-------------------------------------
        Reporter:  ttylec            |                Owner:  bgamari
            Type:  bug               |               Status:  new
        Priority:  high              |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:  vector
                                     |  bitwise operations
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by ttylec):

 Replying to [comment:20 tdammers]:

 This is not totally "bad" behavior. This:
...
{{{
"Generated"
benchmarking 64 columns/raw unbox vectors
time                 460.0 μs   (447.0 μs .. 473.6 μs)
                     0.995 R²   (0.992 R² .. 0.997 R²)
mean                 446.4 μs   (440.2 μs .. 455.1 μs)
std dev              24.42 μs   (17.29 μs .. 31.22 μs)
variance introduced by outliers: 48% (moderately inflated)
benchmarking 64 columns/binary packed
time                 52.25 μs   (51.66 μs .. 53.04 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 52.60 μs   (51.99 μs .. 53.67 μs)
std dev              2.665 μs   (1.919 μs .. 4.073 μs)
variance introduced by outliers: 55% (severely inflated)
}}}
this is "good", we have significant speedup. But this:
...
{{{
benchmarking 256 columns/raw unbox vectors
time                 439.9 μs   (434.4 μs .. 447.4 μs)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 439.0 μs   (435.0 μs .. 446.4 μs)
std dev              17.95 μs   (10.79 μs .. 27.38 μs)
variance introduced by outliers: 35% (moderately inflated)
benchmarking 256 columns/binary packed
time                 304.7 μs   (288.7 μs .. 330.4 μs)
                     0.965 R²   (0.940 R² .. 0.998 R²)
mean                 324.1 μs   (302.9 μs .. 364.4 μs)
std dev              62.33 μs   (26.19 μs .. 97.61 μs)
variance introduced by outliers: 91% (severely inflated)
}}}
is "bad". However, what I observed with the full code of our project, the
 speed-up is lost when we exceed the specific number of columns... but that
 number is platform specific (AMD performs worst, Intel is usually good,
 but then to MacBook Pro i5 CPU seems to be better than in i7 Lenovo on
 ubuntu).

 But since on the same platform you get different results for 256 columns,
 having speedup in with 64 columns, make me wonder, can the system kernel
 and/or libraries be affecting that?

 As for me not being able to reproduce my original report: I did try to
 remove `~/.stack` and `~/.stack-work`. Tried both with `split-obj` in
 stack config and without. I still can't get "bad" results on my current
 hardware/software.

 I am using only stack, I don't have system-wide GHC.

 I will try to do more tests on different machine (with debian 9); try the
 macbook at home too.

 But to sum up what we know until now: libraries are ruled out, compiler
 version seems to be ruled out too. What's left? GHC binary package, OS
 kernel, system libs? Does anything of that make sense?

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14980#comment:22
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler