Efficient UArray dot-product?
Is it possible to write an dot-product function that GHC will compile to efficient code without using unsafeAt? I've tried and can't. I'm looking at generic IArray functions pragma-specialized to UArray Int Double. With unsafeAt the -ddump-simpl output looks optimal to me (aside from not unrolling or pipelining**): loop i s | i > ubSI = s | otherwise = let s' = s + (small `unsafeAt` i) * (big `unsafeAt` (off + i)) in seq s' $ loop (i + 1) s' compiles to $wloop = \ ww12 L :: PrelGHC.Int# ww13 L :: PrelGHC.Double# -> case PrelGHC.># ww12 ww9 of wild3 L { PrelBase.True -> PrelFloat.$wD# ww13; PrelBase.False -> $wloop (PrelGHC.+# ww12 1) (PrelGHC.+## ww13 (PrelGHC.*## (PrelGHC.indexDoubleArray# ww2 ww12) (PrelGHC.indexDoubleArray# ww6 (PrelGHC.+# a2 ww12)))) }; } in Without it (i.e. with (!)) the compiler will only use primitive arithmetic for addressing one of the arrays. Why? Is there a way to write dot-product that produces good code without resorting to unsafeAt? thanks, mike ** Pipelining probably isn't going to be worth your time in so far as you're targeting processors with dynamic scheduling. They'll pipeline things for you.
Is it possible to write an dot-product function that GHC will compile to efficient code without using unsafeAt? I've tried and can't. I'm looking at generic IArray functions pragma-specialized to UArray Int Double. With unsafeAt the -ddump-simpl output looks optimal to me (aside from not unrolling or pipelining**):
loop i s | i > ubSI = s | otherwise = let s' = s + (small `unsafeAt` i) * (big `unsafeAt` (off + i)) in seq s' $ loop (i + 1) s'
Isn't that rather the wrong question? I'd be more interested in "How do I get ghc to generate efficient code for dotprod?" dotprod a b = sum (map e (indices a)) where e ix = a!ix * b!ix Especially since ghc does a pretty damn good job for this version {-# SPECIALISE dotprod :: UArray Int Double -> UArray Int Double -> Double #-} dotprod a b = foldr (+) 0 (map e (indices a)) where e ix = a!ix * b!ix (It's a pity that it can only eliminate the list for foldr, not the foldl used in sum). I'm not at all familiar with the intermediate code, so I don't know what in the output from the above is particularly inefficient. It does seem to do a lot of deconstructing and I've no idea how to get round that, but I'd much prefer to see a definition like dotprod in the Haskell code and all the mysteries and contortions done in ghc. Cheers, Jón PS It's a national holiday in the UK, so you might not hear anything from the ghc guys for a bit. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk 31 Chalmers Road jf@cl.cam.ac.uk Cambridge CB1 3SZ +44 1223 570179 (after 14:00 only, please!)
participants (2)
-
Jon Fairbairn -
Mike Gunter