
aeyakovenko:
So I tried implementing a more efficient sha1 in haskell, and i got to about 12 times slower as C. The darcs implementation is also around 10 to 12 times slower, and the crypto one is about 450 times slower. I haven't yet unrolled the loop like the darcs implementation does, so I can still get some improvement from that, but I want that to be the last thing i do.
I think I've been getting speed improvements when minimizing unnecessary allocations. I went from 40 times slower to 12 times slower by converting a foldM to a mapM that modifies a mutable array.
Anyone have any pointers on how to get hashElem and updateElem to run faster, or any insight on what exactly they are allocating. To me it seems that those functions should be able to do everything they need to without a malloc.
Try inlining key small functions, and check the core. -O2 -ddump-simpl | less -- Don