Re: [Haskell-cafe] repa parallelization results

14 Jan 2016


      To avoid any confusion, this was a reply to the following email:

On Fri, Mar 13, 2015 at 6:23 PM, Anatoly Yakovenko 
 wrote:
...
https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8
so i am seeing basically results with N4 that are as good as using
sequential computation on my macbook for the matrix multiply
algorithm.  any idea why?
Thanks,
Anatoly
On Thu, Jan 14, 2016 at 8:19 PM, Thomas Miedema 
wrote:
...
Anatoly: I also ran your benchmark, and can not reproduce your findings.
Note that GHC does not make effective use of hyperthreads (
https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12). So don't use
-N4 when you have only a dual core machine. Maybe that's why you were
getting bad results? I also notice a `NaN` in one of your timing results. I
don't know how that is possible, or if it affected your results. Could you
try running your benchmark again, but this time with -N2?
On Sat, Mar 14, 2015 at 5:21 PM, Carter Schonwald <
carter.schonwald@gmail.com> wrote:
...
dense matrix product is not an algorithm that makes sense in repa's
execution model,
Matrix multiplication is the first example in the first repa paper:
http://benl.ouroborus.net/papers/repa/repa-icfp2010.pdf. Look at figures
2 and 7.
"we measured very good absolute speedup, ×7.2 for 8 cores, on
multicore hardware"
Doing a quick experiment with 2 threads (my laptop doesn't have more
cores):
$ cabal install repa-examples    # I did not bother with `-fllvm`
...
$ ~/.cabal/bin/repa-mmult -random 1024 1024 -random 1024 1204
elapsedTimeMS   = 6491
$ ~/.cabal/bin/repa-mmult -random 1024 1024 -random 1024 1204 +RTS -N2
elapsedTimeMS   = 3393
This is with GHC 7.10.3 and repa-3.4.0.1 (and dependencies from
http://www.stackage.org/snapshot/lts-3.22)