
| 5. State# threads clog the optimizer quite effectively. Replacing | st(n-1)# with realWorld# everywhere I could count on data | dependencies to do the same job doubled performance. The idea is that the optimiser should allow you to write at a high level, and do the book keeping for you. When it doesn't, I like to know, and preferably fix. If you had a moment to boil out a small, reproducible example of this kind of optimisation failure (with as few dependencies as poss), then I'll look to see if the optimiser can be cleverer. | | 6. The inliner is a bit too greedy. Removing the slow-path code from | singleton doesn't help because popSingleton is only used once; but | if I explicitly {-# NOINLINE popSingleton #-}, the code for | singleton itself becomes much smaller, and inlinable (15% perf | gain). Plus the new singleton doesn't allocate memory, so I can | use even MORE realWorld#s. That's a hard one! Inlining functions that are called just once is a huge win usually. I don't know how to spot what you did in an automated way. thanks Simon