
Stefan O'Rear wrote:
2. Parameters are very expensive. Our type of functions that build (ignoring CPS for the time being) was MBA# -> Int# -> [ByteString], where the Int# is the current write pointer. Adding an extra Int# to cache the size of the array (rather than calling sMBA# each time) slowed the code down ~2x. Conversely, moving the write pointer into the byte array (storing it in bytes 0#, 1#, 2#, and 3#) sped the code by 4x.
If you were measuring on x86 then parameters are passed on the stack, which may be expensive. On x86_64 the first 3 arguments are passed in registers, which is usually a win, but if the function immediately does an eval they need to be saved on the stack anyway. Still, 4x sounds like a lot, perhaps you managed to avoid a stack check in the inner loop or something.
3. MBA# is just as fast as Addr#, and garbage collected to boot.
Not really surprising, that.
4. You can't keep track of which version of the code is which, what is a regression, and what is an enhancement. Don't even try. Next time I try something like this I will make as much use of darcs as possible.
Absolutely - if you'd used darcs, then we could peer in more detail at changes that you thought gave counter-intuitive results. Simon Peyton-Jones wrote:
| 5. State# threads clog the optimizer quite effectively. Replacing | st(n-1)# with realWorld# everywhere I could count on data | dependencies to do the same job doubled performance.
The idea is that the optimiser should allow you to write at a high level, and do the book keeping for you. When it doesn't, I like to know, and preferably fix.
If you had a moment to boil out a small, reproducible example of this kind of optimisation failure (with as few dependencies as poss), then I'll look to see if the optimiser can be cleverer.
Yes, and *please* add some of this folklore to the performance wiki at http://haskell.org/haskellwiki/Performance, if you have the time. Cheers, Simon