
#14980: Runtime performance regression with binary operations on vectors -------------------------------------+------------------------------------- Reporter: ttylec | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.8.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: vector | bitwise operations Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by ttylec): Replying to [comment:20 tdammers]: This is not totally "bad" behavior. This:
{{{ "Generated" benchmarking 64 columns/raw unbox vectors time 460.0 μs (447.0 μs .. 473.6 μs) 0.995 R² (0.992 R² .. 0.997 R²) mean 446.4 μs (440.2 μs .. 455.1 μs) std dev 24.42 μs (17.29 μs .. 31.22 μs) variance introduced by outliers: 48% (moderately inflated)
benchmarking 64 columns/binary packed time 52.25 μs (51.66 μs .. 53.04 μs) 0.998 R² (0.997 R² .. 0.999 R²) mean 52.60 μs (51.99 μs .. 53.67 μs) std dev 2.665 μs (1.919 μs .. 4.073 μs) variance introduced by outliers: 55% (severely inflated) }}}
this is "good", we have significant speedup. But this:
{{{ benchmarking 256 columns/raw unbox vectors time 439.9 μs (434.4 μs .. 447.4 μs) 0.998 R² (0.997 R² .. 1.000 R²) mean 439.0 μs (435.0 μs .. 446.4 μs) std dev 17.95 μs (10.79 μs .. 27.38 μs) variance introduced by outliers: 35% (moderately inflated)
benchmarking 256 columns/binary packed time 304.7 μs (288.7 μs .. 330.4 μs) 0.965 R² (0.940 R² .. 0.998 R²) mean 324.1 μs (302.9 μs .. 364.4 μs) std dev 62.33 μs (26.19 μs .. 97.61 μs) variance introduced by outliers: 91% (severely inflated) }}}
is "bad". However, what I observed with the full code of our project, the speed-up is lost when we exceed the specific number of columns... but that number is platform specific (AMD performs worst, Intel is usually good, but then to MacBook Pro i5 CPU seems to be better than in i7 Lenovo on ubuntu). But since on the same platform you get different results for 256 columns, having speedup in with 64 columns, make me wonder, can the system kernel and/or libraries be affecting that? As for me not being able to reproduce my original report: I did try to remove `~/.stack` and `~/.stack-work`. Tried both with `split-obj` in stack config and without. I still can't get "bad" results on my current hardware/software. I am using only stack, I don't have system-wide GHC. I will try to do more tests on different machine (with debian 9); try the macbook at home too. But to sum up what we know until now: libraries are ruled out, compiler version seems to be ruled out too. What's left? GHC binary package, OS kernel, system libs? Does anything of that make sense? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14980#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler