
Hey! Thanks a lot for your reply!
Disclaimer: I am compiling GHC 7.6.1-rc1 while testing this, so my measurements might be unreliable. Best try it out yourself.
Also, this blind-stab-optimization is /not/ best practice, I just enjoyed fiddling around.
What /is/ best practice in regards to performance optimization of code such as mine? What other options than blind stabbing do I have? As for applying your suggestions: I have improved the speed of my overall program by about 15% (see the new profiling I have obtained), but it still seems as if the program spends too much time reading from the vector and multiplying 5 integers --- but probably the cost centers are allocated wrongly? I tried various combinations of the changes you suggest, and had some very interesting results. It seems like there are some performance differences using GHC 7.4.2 (which I use due to cabal-dev) and GHC 7.6.*. Using my initial code, I get a runtime of around 15.7s. On 12/11/12 at 10:43pm, Joachim Breitner wrote:
neighProd vs l !i = lukUp vs l (i-2) * lukUp vs l (i-1) * lukUp vs l (i) * lukUp vs l (i+1) * lukUp vs l (i+2)
Using that code rather than a map, actually leads to slower results! I am getting an average of 18.0s for this. Interestingly, passing the index `!i' as an unboxed Int leads to a time-increase in my map version --- 16.8s. However:
Using V.unsafeIndex there gives another noticable speedup.
This helps a lot, giving me 14.8s and 15.1 for your suggestion; but combining it with
lukUp :: V.Vector Int -> Int -> Int -> Int lukUp vs l i = let !i' = i `mod` l in V.unsafeIndex vs i'
did, according to core, prevent some boxing, but did not speed up the program. Allocating Ints that are unused before the next GC are very cheap.
gives me a time of about 12.8s with your suggestions (explicit calls to `lukUp', unsafeIndex'ing), which is the fastest result I got so far!
You can save more time by not passing l around, the size of the vector is already present in vs and can cheaply be accessed by "V.length vs".
I decided to pass l around explicitly in order to safe on expensive length-calculations. As it turns out, checking the bounds gives me a decrease in speed in all cases. Furthermore, using `mod' is actually a feature of my underlying model, as it enforces periodic boundary conditions (think of the vector representing a circle). Therefore the boundary checking just happens accidentally.
Oh, and you are creating a new byte array in every invocation of randVec. That is of course expensive. Using mutable arrays seems to be a more natural choice. Or is that really what you want to do? Judging from your code I would expect that you want to create a new array consisting only of values returned by neighProd; with the current code, when updating the second value you are including the _updated_ first entry in your product.
I will try to use a mutable vector to see if I can speed the thing up further, since I do indeed want to update my vector one element at a time, taking into account the elements 4 nearest and next-to-nearest neighbours (as you can see, every iteration of the code acts on the updated vector). I just decided to go with an unmutable structure (a list at first, followed by a Data.Map structure), in order to work without side effects. I'd be happy to get more input about how to increase the performance of my code! Cheers Janis