
So I tried implementing a more efficient sha1 in haskell, and i got to about 12 times slower as C. The darcs implementation is also around 10 to 12 times slower, and the crypto one is about 450 times slower. I haven't yet unrolled the loop like the darcs implementation does, so I can still get some improvement from that, but I want that to be the last thing i do. I think I've been getting speed improvements when minimizing unnecessary allocations. I went from 40 times slower to 12 times slower by converting a foldM to a mapM that modifies a mutable array. Anyone have any pointers on how to get hashElem and updateElem to run faster, or any insight on what exactly they are allocating. To me it seems that those functions should be able to do everything they need to without a malloc. This is the profiling statistics generated from my implementation COST CENTRE MODULE %time %alloc hashElem SHA1 42.9 66.2 updateElem SHA1 12.7 16.7 unboxW SHA1 10.6 0.0 hashA80 SHA1 5.2 0.3 temp SHA1 4.6 0.0 sRotateL SHA1 4.6 0.0 ffkk SHA1 3.3 2.6 hashA16IntoA80 SHA1 3.1 0.1 sXor SHA1 2.9 0.0 do60 SHA1 2.9 2.6 sAdd SHA1 2.3 0.0 do20 SHA1 1.3 2.6 splitByN SHA1 1.2 2.3 do80 SHA1 0.8 2.6 do40 SHA1 0.4 2.6 Thanks, Anatoly