
Thanks for taking a look at this. kirby81:
2008/5/17 Andrew Coppin
: Hi folks.
OK, try this:
darcs get http://darcs.orphi.me.uk/MyMD5 cd MyMD5 ghc -O2 --make md5sum md5sum <some large filename>
I've got some time to take a look at the code. It's very nice, readable and declarative, but obviously not optimized for "raw speed". There're a few things to do to have a speed-up of 2x, without going "low-level".
1) There's a lazyness leak in Pad.hs. The sum in: else make_block_bs bs0 : work (n + block_size_bytes) bs1 is not strict. With a very large file (e.g. 100 mb) it means stack overflow.
[roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it.
To solve it just add a bang (!) before the n parameter:
work !n bs =
You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file too. There're solutions that don't imply BangPatterns, and use only seq, but I like bang patterns!
Now it works: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 11.958u 0.210s 0:12.19 99.7% 0+0k 0+2io 0pf+0w
Good idea. I saw some lazy left folds in there too, which look suspicious.
2) make_block_bs is sub-optimal, and very critical to performance. I decided to use Data.Binary for it (it's also more readable, and you get rid of the unsafeWrite): import Data.Binary import Data.Binary.Get import Control.Monad
// ...
instance Binary Block where put _ = undefined get = do xs <- replicateM 16 getWord32le return $ Block $ listArray (0, 15) xs
make_block_bs :: B.ByteString -> Block make_block_bs = decode
Now we are getting faster: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 8.063u 0.174s 0:08.23 100.0% 0+0k 0+0io 0pf+0w
Definitely a good idea.
3) You are doing a lot of access to fields of a strict data type (State). You can at least ask the compiler to help you a bit with -funbox-strict-fields.
Now we have: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 6.780u 0.133s 0:06.91 100.0% 0+0k 0+1io 0pf+0w
We have got a good, nearly 2x, speed-up with very few optimizations, and we run in a very small constant amount of memory.
Great.
Probably compiling with -fvia-C could help even more, but strangely:
[roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields -O2 --make md5sum.hs [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Segmentation fault
Is this supposed to happen? There're no "unsafe" functions or imports used in the program. Maybe a bug in GHC?
That could be a gcc bug, in fact. Can you provide the final source? Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 ) Andrew, I hope you look carefully at the suggestions here, and use them to improve the code you were working on. -- Don