
Hi, In order to do a performance comparison beetween different approaches for our application, I make different implementation of a simple example (computing the norm of a vector expression. I rely on Repa to do this. However, when I tried to build the parallel version (-threaded -fvectorise -rtsopts), I got an error specifying that dph-par was not available. Indeed, It wasn't. As explained on the dph webpage, I installed a developpment version of ghc (ghc-7.1.20110331). However, when I try to build my application, I get the folowing error: user error (Pattern match failure in do expression at compiler/vectorise/Vectorise/Monad.hs:57:10-20) My source code is attached and compiled with the following line: ghc haskell.hs -O3 -XBangPatterns -msse2 -fforce-recomp -rtsopts -threaded -fvectorise So my question is : is the problem concerning the pattern matching known ? If yes, is there a workaround ? is there an other version of dph that would work with Repa ? Thank you ! ----- Wilfried Kirschenmann "An expert is a person who has made all the mistakes that can be made in a very narrow field."* *Niels Bohr - *Danish physicist (1885 - 1962)*

On 12/04/2011, at 7:32 PM, Wilfried Kirschenmann wrote:
Hi,
In order to do a performance comparison beetween different approaches for our application, I make different implementation of a simple example (computing the norm of a vector expression. I rely on Repa to do this. However, when I tried to build the parallel version (-threaded -fvectorise -rtsopts), I got an error specifying that dph-par was not available. Indeed, It wasn't.
Repa and DPH are different projects. The compilation mechanism and approach to parallelism is quite different between them. You only need -fvectorise to turn on the vectoriser for DPH code. You don't need (or want) -fvectorise for Repa programs. DPH is also still at the "research prototype" stage, and not yet at a point where you'd try to use it for anything real. With your example code, you also need to use R.force at appropriate points, and add matches against @(Array _ [Region RangeAll (GenManifest _)]). The reasons for both of these are explained in [1]. Hopefully the second will be fixed by a subsequent GHC release. You must also add {-# INLINE fun #-} pragmas to polymorphic functions or you will pay the price of dictionary passing for the type class overloading. With the attached code: desire:tmp benl$ ghc --version The Glorious Glasgow Haskell Compilation System, version 7.0.3 desire:tmp benl$ ghc-pkg list |grep repa repa-2.0.0.2 repa-algorithms-2.0.0.2 repa-bytestring-2.0.0.2 repa-io-2.0.0.2 desire:tmp benl$ ghc -rtsopts -threaded -O3 -fllvm -optlo-O3 -fno-liberate-case --make haskell.hs -XBangPatterns -fforce-recomp desire:tmp benl$ /usr/bin/time ./haskell [3.3645823e12] 725188000000 6.62 real 6.39 user 0.22 sys This runs but doesn't scale with an increasing number of threads. I haven't looked at why. If all the work is in R.sum then that might be the problem -- I haven't put much time into optimising reductions, just maps and filters. Cheers, Ben. [1] http://www.cse.unsw.edu.au/~benl/papers/stencil/stencil-icfp2011-sub.pdf

Repa and DPH are different projects. The compilation mechanism and approach to parallelism is quite different between them. You only need -fvectorise to turn on the vectoriser for DPH code. You don't need (or want) -fvectorise for Repa programs. DPH is also still at the "research prototype" stage, and not yet at a point where you'd try to use it for anything real.
OK.
With your example code, you also need to use R.force at appropriate points, and add matches against @(Array _ [Region RangeAll (GenManifest _)]). The reasons for both of these are explained in [1]. Hopefully the second will be fixed by a subsequent GHC release. You must also add {-# INLINE fun #-} pragmas to polymorphic functions or you will pay the price of dictionary passing for the type class overloading.
This runs but doesn't scale with an increasing number of threads. I haven't looked at why. If all the work is in R.sum then that might be the problem -- I haven't put much time into optimising reductions, just maps and filters.
surprisingly, when removing the R.force from the code you attached, performances are better (speed-up=2). I suppose but I am not sure that this allow for loop fusions beetween the R.map ant the R.sum. I use ghc 7.0.3, Repa 2.0.0.3 and LLVM 2.9. By the end, the performances with this new version (0.48s) is 15x better than my original version (6.9s) However, the equivalent sequential C code is still 15x better (0.034s). This may indeed be explained by the fact that all computations are performed inside the R.sum. Carrefully tuned, this function shouldn't scale well (x3 on a dual processor, each with 4 core) since the performances are limited by the memory bandwidth. However, this implementation doesn't scale at all. Whthout R.force, parallel performances are exactly the same as sequential performances. With R.force, using the 8 core achieves a 1.1 speed-up. Thank you for your help.

On 12/04/2011, at 11:50 PM, Wilfried Kirschenmann wrote:
surprisingly, when removing the R.force from the code you attached, performances are better (speed-up=2). I suppose but I am not sure that this allow for loop fusions beetween the R.map ant the R.sum.
I use ghc 7.0.3, Repa 2.0.0.3 and LLVM 2.9.
By the end, the performances with this new version (0.48s) is 15x better than my original version (6.9s) However, the equivalent sequential C code is still 15x better (0.034s).
This may indeed be explained by the fact that all computations are performed inside the R.sum.
Yeah, the Repa fold and sum functions just use the equivalent Data.Vector ones. They're not parallelised and I haven't looked at the generated code. I'll add a ticket to the trac to fix these, but won't have time to work on it myself in the near future. Ben.

Yeah, the Repa fold and sum functions just use the equivalent Data.Vector ones. They're not parallelised and I haven't looked at the generated code. I'll add a ticket to the trac to fix these, but won't have time to work on it myself in the near future.
Ok. Thank you for your help. I will try again with the future versions.
participants (2)
-
Ben Lippmeier
-
Wilfried Kirschenmann