
(I'm posting this here in the hope people like Manual and Roman will see it and take some interest.) I've built the GHC 6.10 beta (that is, The Glorious Glasgow Haskell Compilation System, version 6.10.0.20080921) mainly so I could play around with DPH because it's too exciting to ignore any longer. The program I'm using here is pretty trivial but it's a simple enough test case, I think. It's really cribbed from dph's examples almost entirely, I just made some superficial data sets to test on and here are the results. The code is attached in a .tar file; you'll need to tweak the Makefile to point to your GHC instead of mine if you use it. Things to note: * The vectorise pass boosts compilation times *a lot*. I don't think this is exactly unwarrented since it seems like a pretty complicated transformation, but while making the primitive version using just the unlifted interface the compilation takes about 1.5 seconds, for the vectorised version it's on the order of 15 seconds. For something as trivial as this dot-product thing, that's a bit of a compilation time, though. * It's pretty much impossible to use ghc-core to examine the output core of the vectorised version - I let it run and before anything started showing up in `less` it was already using on the order of 100mb of memory. If I just add -ddump-simpl to the command line, the reason is obvious: the core generated is absolutely huge. * For the benchmark included, the vectorised ver. spends about 98% of its time from what I can see in the GC before it dies from stack overflow. I haven't tried something like +RTS -A1G -RTS yet, though. * The vectoriser is really, really touchy. For example, the below code sample works (from DotPVect.hs):
import Data.Array.Parallel.Prelude.Int as I
dotp :: [:Int:] -> [:Int:] -> Int dotp v w = I.sumP [: (I.*) x y | x <- v, y <- w :]
This however, does not work:
dotp :: [:Int:] -> [:Int:] -> Int dotp v w = I.sumP [: (Prelude.*) x y | x <- v, y <- w :]
That is, just using the version from Prelude causes the vectoriser to err like so:
ghc -o vect --make test.hs -fcpr-off -threaded -Odph -funbox-strict-fields -fdph-par [1 of 2] Compiling DotPVect ( DotPVect.hs, DotPVect.o ) *** Vectorisation error *** Tycon not vectorised: GHC.Num.:TNum make: *** [vect] Error 1
This is a particularly strange occurance; reason being if we dig into the source code of the dph-par package we see that Data.Array.Parallel.Prelude.Int simply re-exports Data.Array.Parallel.Prelude.Base.Int (which is a hidden module, mind you) which is where (*) is defined, and it is defined like so:
(+), (-), (*) :: Int -> Int -> Int (+) = (P.+) (-) = (P.-) (*) = (P.*)
Where 'P' is the qualified Prelude import. Am I misunderstanding something here about how the dph packages are layed out? I think this is pretty correct and if so it's a really really strange happening to be honest. I also ran into a few other errs relating to the vectoriser dying - if I can find some I'll reply to this with some results. So far this all seems pretty negative, but on the flip-side... * The unlifted interface exported by the dph-prim-{par,seq} packages is wonderful and already works really well. See http://hpaste.org/10621 for an example - super low-GC time, both of my cores get used. * As I have increased the data sets for the dot-product example, my cores continue to get used and the GC time stays really, really low which is a great thing. * I've yet to hit any strange compilation err or problem when using the primitive packages. * Strictly speaking the dph-{par,seq} packages seem to expose more to code and combinators to those pieces of code using the vectorisation pass, but the combinators here are simple, they work and you get good results. * GHC hits a lot of optimizations, with ghc-core you can see thousands upon thousands of rules firing all over the place to aggressively inline/transform things! The DPH work done so far seems fantastic and I realize this is all in the works so everything I say now might not be worth anything tomorrow, and GHC 6.10 is only shipping with a very limited version of the system, but I figured some people would like to see initial results on some small test cases. I plan on exploiting this package a lot more in the future and testing it with larger computations and data sets, and when I do I'll be sure to give the feedback to you guys (once HEAD is steaming along again and the 6.10 branch has calmed.) Austin