 
            Hi, Am Dienstag, den 11.12.2012, 15:25 +0100 schrieb Richard Janis Beckert:
I would be very happy for some input on this, because I am pretty new to Haskell and I don't really know how to do proper profiling.
by looking at the core (-ddump-simpl) I found a few issues. neighProd vs l i = foldl' (\acc x -> acc * (lukUp vs l i) ) 1 [i-2 ,i-1 ,i ,i+1 ,i+2] will, for every call to neighProd, create a list of boxed integers and fold over it. I expected this boxing and unboxing to matter, so I switched to: neighProd vs l i = lukUp vs l (i-2) * lukUp vs l (i-1) * lukUp vs l (i) * lukUp vs l (i+1) * lukUp vs l (i+2) This will just allocate five machine integers, fills them and then multiplies them. Surprisingly, I did not notice a speed up. Reading more code I noticed that GHC did not figure out that the parameter i to neighProd is always required, and hence it is passed in as a boxed Integer. Again, boxing and unboxing costs time. Here neighProd vs l !i = lukUp vs l (i-2) * lukUp vs l (i-1) * lukUp vs l (i) * lukUp vs l (i+1) * lukUp vs l (i+2) fares better. There are more problems which I am not sure how to get rid of. For example, every call to div in lukUp makes a check if l is zero _and_ the indexing does bounds checking – the compiler cannot infer that just because you div’ed by l the bounds are guaranteed to be good. Using V.unsafeIndex there gives another noticable speedup. But not every Bang Pattern that avoids boxing gives a noticable speedup. E.g. the Bang Pattern in lukUp :: V.Vector Int -> Int -> Int -> Int lukUp vs l i = let !i' = i `mod` l in V.unsafeIndex vs i' did, according to core, prevent some boxing, but did not speed up the program. Allocating Ints that are unused before the next GC are very cheap. You can save more time by not passing l around, the size of the vector is already present in vs and can cheaply be accessed by "V.length vs". You can prevent the five checks if the lenghts is zero by making that check yourself first – GHC detects that it is in the branch where there length is not 0 (and not -1) and optimize that out: neighProd vs (!i) = case V.length vs of 0 -> undefined -1 -> undefined _ -> lukUp vs (i-2) * lukUp vs (i-1) * lukUp vs (i) * lukUp vs (i+1) * lukUp vs (i+2) The speedup is low; maybe later phases of the compiler did that already. Oh, and you are creating a new byte array in every invocation of randVec. That is of course expensive. Using mutable arrays seems to be a more natural choice. Or is that really what you want to do? Judging from your code I would expect that you want to create a new array consisting only of values returned by neighProd; with the current code, when updating the second value you are including the _updated_ first entry in your product. Disclaimer: I am compiling GHC 7.6.1-rc1 while testing this, so my measurements might be unreliable. Best try it out yourself. Also, this blind-stab-optimization is /not/ best practice, I just enjoyed fiddling around. Greetings, Joachim -- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de Jabber-ID: nomeata@joachim-breitner.de