Re: [Haskell-cafe] repa parallelization results

14 Jan 2016


      Not sure what changed, but after rerunning it I get expected results:

anatolys-MacBook:rbm anatolyy$  dist/build/proto/proto +RTS -N2
benchmarking P
time                 1.791 s    (1.443 s .. 2.304 s)
                     0.991 R²   (0.974 R² .. 1.000 R²)
mean                 1.803 s    (1.750 s .. 1.855 s)
std dev              90.06 ms   (0.0 s .. 90.90 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking S
time                 3.225 s    (2.685 s .. 3.837 s)
                     0.996 R²   (0.985 R² .. 1.000 R²)
mean                 3.033 s    (2.857 s .. 3.142 s)
std dev              165.0 ms   (0.0 s .. 188.7 ms)
variance introduced by outliers: 19% (moderately inflated)

perf log written to dist/perf-mmult.html
anatolys-MacBook:rbm anatolyy$  dist/build/proto/proto +RTS -N4
benchmarking P
time                 1.851 s    (1.326 s .. 2.316 s)
                     0.990 R²   (0.964 R² .. 1.000 R²)
mean                 1.784 s    (1.693 s .. 1.901 s)
std dev              106.3 ms   (0.0 s .. 119.8 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking S
time                 3.329 s    (3.041 s .. 3.944 s)
                     0.996 R²   (0.993 R² .. 1.000 R²)
mean                 3.173 s    (3.100 s .. 3.244 s)
std dev              119.6 ms   (0.0 s .. 121.9 ms)
variance introduced by outliers: 19% (moderately inflated)

perf log written to dist/perf-mmult.html
anatolys-MacBook:rbm anatolyy$  dist/build/proto/proto +RTS -N
benchmarking P
time                 1.717 s    (1.654 s .. 1.830 s)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 1.717 s    (1.701 s .. 1.728 s)
std dev              16.64 ms   (0.0 s .. 19.20 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking S
time                 3.127 s    (3.079 s .. 3.222 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.105 s    (3.094 s .. 3.116 s)
std dev              18.12 ms   (543.9 as .. 18.50 ms)
variance introduced by outliers: 19% (moderately inflated)

perf log written to dist/perf-mmult.html


On Thu, Jan 14, 2016 at 11:22 AM Thomas Miedema 
wrote:
...
To avoid any confusion, this was a reply to the following email:
On Fri, Mar 13, 2015 at 6:23 PM, Anatoly Yakovenko 
 wrote:
...
https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8
so i am seeing basically results with N4 that are as good as using
sequential computation on my macbook for the matrix multiply
algorithm.  any idea why?
Thanks,
Anatoly
On Thu, Jan 14, 2016 at 8:19 PM, Thomas Miedema 
wrote:
...
Anatoly: I also ran your benchmark, and can not reproduce your findings.
Note that GHC does not make effective use of hyperthreads (
https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12). So don't use
-N4 when you have only a dual core machine. Maybe that's why you were
getting bad results? I also notice a `NaN` in one of your timing results. I
don't know how that is possible, or if it affected your results. Could you
try running your benchmark again, but this time with -N2?
On Sat, Mar 14, 2015 at 5:21 PM, Carter Schonwald <
carter.schonwald@gmail.com> wrote:
...
dense matrix product is not an algorithm that makes sense in repa's
execution model,
Matrix multiplication is the first example in the first repa paper:
http://benl.ouroborus.net/papers/repa/repa-icfp2010.pdf. Look at figures
2 and 7.
"we measured very good absolute speedup, ×7.2 for 8 cores, on
multicore hardware"
Doing a quick experiment with 2 threads (my laptop doesn't have more
cores):
$ cabal install repa-examples    # I did not bother with `-fllvm`
...
$ ~/.cabal/bin/repa-mmult -random 1024 1024 -random 1024 1204
elapsedTimeMS   = 6491
$ ~/.cabal/bin/repa-mmult -random 1024 1024 -random 1024 1204 +RTS -N2
elapsedTimeMS   = 3393
This is with GHC 7.10.3 and repa-3.4.0.1 (and dependencies from
http://www.stackage.org/snapshot/lts-3.22)