
2008/5/17 Andrew Coppin
Hi folks.
OK, try this:
darcs get http://darcs.orphi.me.uk/MyMD5 cd MyMD5 ghc -O2 --make md5sum md5sum <some large filename>
I've got some time to take a look at the code. It's very nice, readable and declarative, but obviously not optimized for "raw speed". There're a few things to do to have a speed-up of 2x, without going "low-level". 1) There's a lazyness leak in Pad.hs. The sum in: else make_block_bs bs0 : work (n + block_size_bytes) bs1 is not strict. With a very large file (e.g. 100 mb) it means stack overflow. [roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it. To solve it just add a bang (!) before the n parameter: work !n bs = You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file too. There're solutions that don't imply BangPatterns, and use only seq, but I like bang patterns! Now it works: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 11.958u 0.210s 0:12.19 99.7% 0+0k 0+2io 0pf+0w 2) make_block_bs is sub-optimal, and very critical to performance. I decided to use Data.Binary for it (it's also more readable, and you get rid of the unsafeWrite): import Data.Binary import Data.Binary.Get import Control.Monad // ... instance Binary Block where put _ = undefined get = do xs <- replicateM 16 getWord32le return $ Block $ listArray (0, 15) xs make_block_bs :: B.ByteString -> Block make_block_bs = decode Now we are getting faster: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 8.063u 0.174s 0:08.23 100.0% 0+0k 0+0io 0pf+0w 3) You are doing a lot of access to fields of a strict data type (State). You can at least ask the compiler to help you a bit with -funbox-strict-fields. Now we have: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 6.780u 0.133s 0:06.91 100.0% 0+0k 0+1io 0pf+0w We have got a good, nearly 2x, speed-up with very few optimizations, and we run in a very small constant amount of memory. Probably compiling with -fvia-C could help even more, but strangely: [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields -O2 --make md5sum.hs [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Segmentation fault Is this supposed to happen? There're no "unsafe" functions or imports used in the program. Maybe a bug in GHC? Salvatore