https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8
so i am seeing basically results with N4 that are as good as using
sequential computation on my macbook for the matrix multiply
algorithm. any idea why?
Thanks,
Anatoly
Anatoly: I also ran your benchmark, and can not reproduce your findings.Note that GHC does not make effective use of hyperthreads (https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12). So don't use -N4 when you have only a dual core machine. Maybe that's why you were getting bad results? I also notice a `NaN` in one of your timing results. I don't know how that is possible, or if it affected your results. Could you try running your benchmark again, but this time with -N2?On Sat, Mar 14, 2015 at 5:21 PM, Carter Schonwald <carter.schonwald@gmail.com> wrote:dense matrix product is not an algorithm that makes sense in repa's execution model,Matrix multiplication is the first example in the first repa paper: http://benl.ouroborus.net/papers/repa/repa-icfp2010.pdf. Look at figures 2 and 7."we measured very good absolute speedup, ×7.2 for 8 cores, on multicore hardware"Doing a quick experiment with 2 threads (my laptop doesn't have more cores):$ cabal install repa-examples # I did not bother with `-fllvm`...$ ~/.cabal/bin/repa-mmult -random 1024 1024 -random 1024 1204elapsedTimeMS = 6491$ ~/.cabal/bin/repa-mmult -random 1024 1024 -random 1024 1204 +RTS -N2elapsedTimeMS = 3393This is with GHC 7.10.3 and repa-3.4.0.1 (and dependencies from http://www.stackage.org/snapshot/lts-3.22)