
As a learning excersize, I re-wrote and re-optimized Data.Binary.Builder yesterday. 1. Intuition is NOT your friend. Most obvious pessimizations I made were actually wins, and vice versa. 2. Parameters are very expensive. Our type of functions that build (ignoring CPS for the time being) was MBA# -> Int# -> [ByteString], where the Int# is the current write pointer. Adding an extra Int# to cache the size of the array (rather than calling sMBA# each time) slowed the code down ~2x. Conversely, moving the write pointer into the byte array (storing it in bytes 0#, 1#, 2#, and 3#) sped the code by 4x. 3. MBA# is just as fast as Addr#, and garbage collected to boot. 4. You can't keep track of which version of the code is which, what is a regression, and what is an enhancement. Don't even try. Next time I try something like this I will make as much use of darcs as possible. 5. State# threads clog the optimizer quite effectively. Replacing st(n-1)# with realWorld# everywhere I could count on data dependencies to do the same job doubled performance. 6. The inliner is a bit too greedy. Removing the slow-path code from singleton doesn't help because popSingleton is only used once; but if I explicitly {-# NOINLINE popSingleton #-}, the code for singleton itself becomes much smaller, and inlinable (15% perf gain). Plus the new singleton doesn't allocate memory, so I can use even MORE realWorld#s. And probably a few more I forgot about because of #4. The code is online at http://members.cox.net/stefanor/hackedbuilder if anyone cares (but see #4). Some parting numbers: (Builder7 is my current version, Builder1 is the unmodified rossp/kolmodin builder) stefan@stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder7 Bench.hs ; time ./Bench 2 10000000 330000000 real 0m5.580s user 0m5.540s sys 0m0.032s stefan@stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder7 -DUNROLL Bench.hs ; time ./Bench 2 10000000 330000000 real 0m2.948s user 0m2.908s sys 0m0.036s stefan@stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder1 Bench.hs ; time ./Bench 2 10000000 330000000 real 0m55.708s user 0m54.695s sys 0m0.208s stefan@stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder1 -DUNROLL Bench.hs ; time ./Bench 2 10000000 330000000 real 0m25.888s user 0m25.546s sys 0m0.156s stefan@stefans:~/hackedbuilder$ gcc -O2 -march=pentium4 CBuilder.c -o CBuilder ; time ./CBuilder 10000000 real 0m0.861s user 0m0.860s sys 0m0.000s Stefam