Re: [Haskell-cafe] Profiling slow multiplication of elements read from vector

12 Dec 2012

      Hey! Thanks a lot for your reply!
...
Disclaimer: I am compiling GHC 7.6.1-rc1 while testing this, so my
measurements might be unreliable. Best try it out yourself.
Also, this blind-stab-optimization is /not/ best practice, I just
enjoyed fiddling around.
What /is/ best practice in regards to performance optimization of code
such as mine? What other options than blind stabbing do I have? 

As for applying your suggestions: I have improved the speed of my
overall program by about 15% (see the new profiling I have obtained),
but it still seems as if the program spends too much time reading from
the vector and multiplying 5 integers --- but probably the cost centers
are allocated wrongly?

I tried various combinations of the changes you suggest, and had some
very interesting results.

It seems like there are some performance differences using GHC
7.4.2 (which I use due to cabal-dev) and GHC 7.6.*.

Using my initial code, I get a runtime of around 15.7s.

On 12/11/12 at 10:43pm, Joachim Breitner wrote:
...
neighProd vs l !i = lukUp vs l (i-2) * lukUp vs l (i-1) *
      lukUp vs l (i) * lukUp vs l (i+1) * lukUp vs l (i+2)
Using that code rather than a map, actually leads to slower results! I
am getting an average of 18.0s for this.

Interestingly, passing the index `!i' as an unboxed Int leads to a
time-increase in my map version --- 16.8s.

However:
...
Using V.unsafeIndex there gives another noticable speedup.
This helps a lot, giving me 14.8s and 15.1 for your suggestion; but
combining it with
...
lukUp :: V.Vector Int -> Int -> Int -> Int
lukUp vs l i = let !i' = i `mod` l
                in V.unsafeIndex vs i'
did, according to core, prevent some boxing, but did not speed up the
program. Allocating Ints that are unused before the next GC are very
cheap.
gives me a time of about 12.8s with your suggestions (explicit calls to
`lukUp', unsafeIndex'ing), which is the fastest result I got so far!
...
You can save more time by not passing l around, the size of the vector
is already present in vs and can cheaply be accessed by "V.length vs".
I decided to pass l around explicitly in order to safe on expensive
length-calculations. As it turns out, checking the bounds gives me a
decrease in speed in all cases.

Furthermore, using `mod' is actually a feature of
my underlying model, as it enforces periodic boundary conditions (think
of the vector representing a circle). Therefore the boundary checking
just happens accidentally.
...
Oh, and you are creating a new byte array in every invocation of
randVec. That is of course expensive. Using mutable arrays seems to be a
more natural choice. Or is that really what you want to do? Judging from
your code I would expect that you want to create a new array consisting
only of values returned by neighProd; with the current code, when
updating the second value you are including the _updated_ first entry in
your product.
I will try to use a mutable vector to see if I can speed the thing up
further, since I do indeed want to update my vector one element at a
time, taking into account the elements 4 nearest and next-to-nearest
neighbours (as you can see, every iteration of the code acts on the
updated vector).

I just decided to go with an unmutable structure (a list at first,
followed by a Data.Map structure), in order to work without side
effects.

I'd be happy to get more input about how to increase the performance of
my code!

Cheers

Janis

Re: [Haskell-cafe] Profiling slow multiplication of elements read from vector

Richard Janis Beckert