
My current best try uses the uvector package, has two 'vectors' of type (UArr Double) as input, and relies on the sumU and zipWithU functions which use streaming to compute the result:
dist_fast :: UArr Double -> UArr Double -> Double dist_fast p1 p2 = sumDs `seq` sqrt sumDs where sumDs = sumU ds ds = zipWithU euclidean p1 p2 euclidean x y = d*d where d = x-y
You'll probably want to make sure that 'euclidian' is specialized to the types you need (here 'Double'), not used overloaded for 'Num a=>a' (check -ddump-tc, or -ddump-simpl output). After that, unrolling the fused fold loop (uvector internal) might help a bit, but isn't there yet: http://hackage.haskell.org/trac/ghc/ticket/3123 http://hackage.haskell.org/trac/ghc/wiki/Inlining And even if that gets implemented, it doesn't apply directly to your case, where the loop is in a library, but you might want to control its unrolling in your client code. Having the loop unrolled by a default factor (8x or so) should help for loops like this, with little computation. Claus