
On Wed, 2008-08-27 at 10:25 +1000, Manuel M T Chakravarty wrote:
However, I am not convinced that this layout optimisation is really gaining that much extra performance these days. In particular, since dynamic pointer tagging, very short running "evals" (for which the extra indirection incurs the largest overhead) have become less frequent. Even if there is a slight performance regression, I think, it would be worthwhile to consider giving up on the described layout constraint. It is the Last Quirk that keeps GHC from using standard compiler back-ends (such as LLVM), and I suspect, it is not worth it anymore.
There's also a potential benefit on the other side, that the cpu's instruction cache is not wasted on non-instruction data. Apparently some cpus also do instruction read-ahead and suffer slowdown if they encounter data that does not decode as valid op codes. Obviously it's not allowed to fault since the instructions are not actually executed but it can impact on performance according to Intel manuals (presumably because it ends up flushing some instruction decoding caches or something). Of course, as you've discussed, the only way to find out is to benchmark. Duncan