
On Thursday 16 October 2008 07:03:05 Roman Leshchinskiy wrote:
On 16/10/2008, at 21:34, Simon Peyton-Jones wrote:
BUT people who care probably UNPACK their strict fields too, which is even better. The time you can't do that is for sum types data T = MkT ![Int]
You also can't do it for polymorphic components. I've used code like:
data T a = MkT !a
foo :: T (a,b) -> a foo (MkT (x,y)) = x
Here, unpacking doesn't work but foo could still access the components of the pair directly.
This is actually the situation I was originally looking at. I just simplified it for the sake of posting readable core and assembler. Specifically, I was looking at some of the assembler GHC was generating for some array code to see if it could do a clean enough job to be used instead of C, and was finding this sort of thing because STUArrau is defined as data STUArray s i a = STUArray !i !i !Int (MutableByteArray# s) I also seem to recall seeing the same sort of thing in some of the state code I was also looking at because STRep is defined as type STRep s a = State# s -> (# State# s, a #) (the a being the issue here). With regard to cost, it is probably not that representative, but a typical code path for the toy example I posted earlier goes from leaq -8(%rbp),%rax cmpq %r14,%rax -- not taken jump -- (stack overflow check passed) movq %rsi,%rbx movq $sni_info,-8(%rbp) addq $-8,%rbp testq $7,%rbx -- taken jump -- (x argument had already been forced) movq 7(%rbx),%rbx movq $snj_info,(%rbp) testq $7,%rbx -- taken jump -- (strict constructor argument is already forced) cmpq $1,7(%rbx) -- not taken jump -- (first of the case options) movl $Main_lvl_closure+1,%ebx addq $8,%rbp jmp *(%rbp) to leaq -8(%rbp),%rax cmpq %r14,%rax -- not taken jump -- (stack overflow check passed) movq %rsi,%rbx movq $sni_info,-8(%rbp) addq $-8,%rbp testq $7,%rbx -- taken jump -- (x argument has already been forced) movq 7(%rbx),%rbx cmpq $1,7(%rbx) -- not taken jump -- (first of the case options) movl $Main_lvl_closure+1,%ebx addq $8,%rbp jmp *(%rbp) which is a 22% reduction (18 to 14) in instructions executed in the entire function, or a 40% reduction (10 to 6) in instruction executed in the core of the function (i.e., after the function's argument is possibly forced). Cheers! -Tyson PS: Thanks everyone for the very informative and interesting discussion.