
Dominic Steinitz wrote:
I have re-written the sha1 code so that it is (hopefully) easy to see that it faithfully implements the algorithm (see http://www.itl.nist.gov/fipspubs/fip180-1.htm). Having got rid of the space leak, I have been trying to improve performance.
Currently, the haskell code is 2 orders of magnitude slower than the sha1sum that ships with my linux.
Here's the code that is taking the majority of the time.
($&) :: [Word32] -> [Word32] -> [Word32] a $& b = zipWith (+) a b
-- Word128 -> Word512 -> Word128 oneBlock ss xs = Word128 (as!!80) (bs!!80) (cs!!80) (ds!!80) (es!!80) where ws = xs ++ map (rotL 1) (zipWith4 xxxor wm3s wm8s wm14s ws) where xxxor a b c d = a `xor` b `xor` c `xor` d wm3s = drop (16-3) ws wm8s = drop (16-8) ws wm14s = drop (16-14) ws as = ai:ts bs = bi:as cs = ci:(map (rotL 30) bs) ds = di:cs es = ei:ds ts = (map (rotL 5) as) $& (zipWith4 f [0..] bs cs ds) $& es $& (map k [0..]) $& ws Word128 ai bi ci di ei = ss
Any help would be appreciated.
This code is clean Haskell without algorithmic flaws, optimizing it means to scrape the constant factor off. Of course, de- and constructing those lazy lists is quite expensive and the canonical answer is: deforestation, also known as "fusion". The goal is to avoid building intermediate lists like if they get consumed at some point, here by (!! 80). This is like transforming the factorial or the fibonacci numbers fac n = facs !! n where facs = 1 : zipWith (*) facs [1..] fib n = fibs !! n where fibs = 1:1: zipwWith (+) fibs (tail fibs) to their accumulating cousins fac n = fac' n 1 where fac' 0 x = x fac' !n !x = fac' (n-1) (x*n) fib n = fib' n (1,1) where fib' 0 ( x, y) = x fib' !n (!x,!y) = fib' (n-1) (y,x+y) The algorithm splits in two parts: calculating ws and accumulating the quintuple a,b,c,d,e over it. Fusing the quintuple is straightforward and already suggested by the imperative specification on the website you mentioned (use with -fbang-patterns): oneBlock ss xs = foldl' g (0,ss) ws where ws = ... g (!n,!Word160 a b c d e) w = (n+1, Word160 (rotL 5 a + f n b c d + e + w + k n) a (rotL 30 b) c d)) Together with -funbox-strict-fields, GHC should be able to generate decent assembly from that. Fusing the ws is trickier. Directly appealing to the fibonacci-number example is not recommended because this would mean to keep the last 16 ws in memory and shifting them right to left by hand. But as the "Alternate method of computation" on the website suggests, you can delegate the shifting to an index that shifts around mod 16. Having a mutable array is helpful, then. Of course, you can also fill a large static (boxed!) array of 80 Word8s via ws :: Data.Array.IArray.Array Int Word8 ws = accumArray 0 (0,80) (curry snd) $ zip [0..15] xs ++ [(i, xxor i) | i<-[16..80]] where xxor i = ws ! (i-16) `xor` ws ! (i-3) `xor` ws ! (i-8) `xor` ws ! (i-14) or something like that (I didn't care about correct indices and bounds). GHC can fuse such array accumulations. In general, keeping stuff in lists is not wrong, but ByteStrings are more adapted to current CPU and RAM architecture. Regards, apfelmus