
On 17/02/2010 21:15, Scott Michel wrote:
Depends a lot on the benchmark. The FreeBSD kernel dev crowd (one of whom works for me) have seen performance improvements between 10-20% using LLVM and clang over gcc. It also depends heavily on which optimization passes you have LLVM invoke -- bear in mind that LLVM is a compiler optimization infrastructure first and foremost.
Right, such benchmarks tend to be quickly out of date especially when both projects are being actively developed. I have no vested interest in either - we'll use whatever suits us better, and that seems to be LLVM.
Even so, LLVM doesn't let us generate exactly the code we'd like: we can't use GHC's tables-next-to-code optimisation. Measurements done by David Terei who built the LLVM backend apparently show that this doesn't matter much (~3% slower IIRC), though I'm still surprised that all those extra indirections don't have more of an effect, I think we need to investigate this more closely. It's important because if the LLVM backend is to be a compile-time option, we have to either drop tables-next-to-code, or wait until LLVM supports generating code in that style.
This sounds like an impedance mismatch between GHC's concept of IR and LLVM's.
It certainly is an impedence mismatch - there's no good reason why LLVM couldn't generate the code we want, but its IR doesn't allow us to represent it. So there's every reason to believe that this could be fixed in LLVM without too much difficulty. We can work around the impedence mismatch the other way, by not using tables-next-to-code in GHC, but that costs us a bit in performance.
[disclaimer: grain of salt speculation, haven't read the code] Tables-next-to-code has an obvious cache-friendliness property, BTW.
Oh absolutely, that's why it's not a clear win and some people argue that the code cache pollution should outweigh the negative effects of those extra indirections. Having seen the effect of branch mispredictions though I'm inclined to believe that those indirections are more expensive, though. The cost is this: every return to a stack frame takes two indirections rather than one. Of course GHC's two representations are not the only two you could choose - people have been designing clever ways to map code addresses to data structures for a long time. If returning to a stack frame is the dominant operation then you would put the return address on the stack and use a hash table to map those to info tables. That trades off mutator time against GC time, and we don't know whether it would be a win, but we do know it would take a lot of effort to find out. The tables-next-to-code representation means that you don't have to fiddle around with hash tables, so it's simpler and probably faster.
Generally, there's going to be some instruction prefetch into the cache. This is likely why it's faster. Otherwise, you have to warm up the data cache, since LLVM spills the tables into the target's constant pool.
Not sure what "spills the tables" means, but maybe that's not important.
NCGs should be faster than plain old C. Trying to produce optimized C is the fool's errand, and I'm starting to agree with dropping that. My worry was that the C backend would be dropped in its entirety, also a fool's errand.
Yes, exactly. Cheers, Simon