
On Thu, Oct 27, 2005 at 08:44:10AM +0100, Simon Marlow wrote:
I'd be surprised if this is an issue. GHC doesn't normally touch the info tables during execution (with one exception - getting the tag from a constructor in a datatype with >8 constructors). It touches the info tables during GC, but it doesn't touch the code during GC. So we might push some code out of the cache on a GC, but that shouldn't have a large effect.
Yeah, you are right. I realized this after some more thought, we don't make a new copy of the code for each thunk :)
It could be an alignment issue, I suppose. Or passing arguments in registers (we don't, at the moment, on x86_64).
I tried some experiments using regparm on jhc output on i386 and it didnot cause the dramatic effect noticed with x86_64, so I don't think it is just that. well, it is possible, the x86_64 core might be optimized assuming things are passed in registers while the i386 core might keep the top few stack members in phantom registers or something... but an alignment issue sounds more likely, if we are stradling 4 byte boundries with our 8 byte pointers and ints, that could affect things very much. it is the number one cause of performance problems according to the AMD optimization manual.
If you have any handy test programs, can you try fiddling with the alignment of code blocks and see if you get a measurable difference?
I will try that.
(I'm still digesting your other message, I'll reply in due course).
I am digesting the c-- papers at the moment :) John -- John Meacham - ⑆repetae.net⑆john⑈