
inlining some of the functions definitely gave me a boost, so i am
about 8.5 times slower then openssl sha1sum. I dont really understand
the core output, but after inlining i got a completely different
profile output, i am guessing its because the cost of the inlined
functions is spread to the callers.
COST CENTRE MODULE %time %alloc
updateElem SHA1 13.4 0.0
sRotateL SHA1 13.4 0.0
hashElem SHA1 12.5 0.0
sXor SHA1 10.9 0.0
unboxW SHA1 10.0 0.0
temp SHA1 8.1 0.0
sAdd SHA1 7.8 0.0
sAnd SHA1 5.0 0.0
do20 SHA1 4.1 18.0
hashA16IntoA80 SHA1 2.8 0.9
do60 SHA1 2.5 18.0
splitByN SHA1 2.2 15.6
ffkk SHA1 2.2 0.0
sOr SHA1 1.6 0.0
do40 SHA1 0.9 18.0
hashPtrIntoA80 SHA1 0.6 2.7
hashA80 SHA1 0.6 1.8
do80 SHA1 0.6 18.0
joinTail SHA1 0.0 2.1
main Main 0.0 4.8
On 6/30/07, Donald Bruce Stewart
aeyakovenko:
So I tried implementing a more efficient sha1 in haskell, and i got to about 12 times slower as C. The darcs implementation is also around 10 to 12 times slower, and the crypto one is about 450 times slower. I haven't yet unrolled the loop like the darcs implementation does, so I can still get some improvement from that, but I want that to be the last thing i do.
I think I've been getting speed improvements when minimizing unnecessary allocations. I went from 40 times slower to 12 times slower by converting a foldM to a mapM that modifies a mutable array.
Anyone have any pointers on how to get hashElem and updateElem to run faster, or any insight on what exactly they are allocating. To me it seems that those functions should be able to do everything they need to without a malloc.
Try inlining key small functions, and check the core.
-O2 -ddump-simpl | less
-- Don