dense matrix product is not an algorithm that makes sense in repa's execution model,

in square matrix multiply of two N x N matrices, each result entry depends on 2n values total across the two input matrices.

even then, thats actually the wrong way to parallelize dense matrix product! its worth reading the papers about goto blas and the more recent blis project. a high performance dense matrix multipy winds up needing to do some nested array parallelism with mutable updates to have efficient sharing of sub computations!

On Fri, Mar 13, 2015 at 9:03 PM, Anatoly Yakovenko <aeyakovenko@gmail.com> wrote:

you think the backed would make any difference? this seems like a
runtime issue to me, how are the threads scheduled by the ghc runtime?

On Fri, Mar 13, 2015 at 4:58 PM, KC <kc1956@gmail.com> wrote:
> How is the LLVM?
>
> --
> --
>
> Sent from an expensive device which will be obsolete in a few months! :D
>
> Casey
>
>
> On Mar 13, 2015 10:24 AM, "Anatoly Yakovenko" <aeyakovenko@gmail.com> wrote:
>>
>> https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8
>>
>>
>> so i am seeing basically results with N4 that are as good as using
>> sequential computation on my macbook for the matrix multiply
>> algorithm. any idea why?
>>
>> Thanks,
>> Anatoly
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe@haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe