
| 6. The inliner is a bit too greedy. Removing the slow-path code from | singleton doesn't help because popSingleton is only used once; but | if I explicitly {-# NOINLINE popSingleton #-}, the code for | singleton itself becomes much smaller, and inlinable (15% perf | gain). Plus the new singleton doesn't allocate memory, so I can | use even MORE realWorld#s.
That's a hard one! Inlining functions that are called just once is a huge win usually. I don't know how to spot what you did in an automated way.
Yeah. We found this to be an issue with the Mercury compiler. We processed functions (well, okay predicates, in the case of Mercury) in dependency order. We experimented with top down and bottom up order. Bottom up inlining is great for eliminating all the little access and convenience functions one writes, and top down gets the case above (at least most of the time). IIRC, our experiments showed that overall, bottom up inlining performed significantly better than top down, or arbitrary order. Bottom up inlining worked really well round the leaves because it frequently replaced a call (requiring register saves, etc) with structure packing/unpacking which didn't require register saves/restores. Thus it eliminated calls altogether. It is also advantageous when it allows producers and consumers to be merged, eliminating memory allocations (as noted above). That said, I had better point out that Mercury is strict, which simplifies things rather. Andrew Appel's code generator that used dynamic programming to select between different generated code sequences comes to mind as potential inspiration for a super-duper inliner. cheers, Tom -- Dr Thomas Conway You are beautiful; but learn to work, drtomc@gmail.com for you cannot eat your beauty. -- Congo proverb