FW: Observation re Hugs GC problem on sparcs w/gcc -O2
Looking back through old mail. I mailed this to the Hugs maintainers about 2 years ago, but I think it is still relevant. It contains a suggestion about why GC doesn't always work right on Hugs on sparcs when type.c/static.c/whatever are compiled -O/-O2, and an easy fix. J | -----Original Message----- | From: Julian Seward (Intl Vendor) [mailto:v-julsew@microsoft.com] | Sent: Thursday, April 22, 1999 10:13 AM | To: 'hugs-bugs@cs.yale.edu' | Cc: 'D.Wakeling@exeter.ac.uk' | Subject: Observation re Hugs GC problem on sparcs w/gcc -O2 | | | | Mark | | I seem to gather that (normal) Hugs has GC problems on Sparc | with optimisation on. David W also reported this recently. | | I got the impression that you think this happens because the | register window mechanism magically obscures some registers | which could hold roots, at the point when GC occurs. | | I would like to offer a different explanation, plus an easy | fix. When I was working for the UFO project at Manchester, | we did a collector which took roots from the C stack. It too | had problems on sparcs, and the following observation saved | the day. I think it might equally apply to Hugs. | | In the code below, complexFnReturningAPair is obvious, | and ... compute_and_alloc ... refers to some code which | may cause a garbage collection. | | | Pair complexFnReturningAPair ( ... ) { .... } | | | Pair p; | | p = complexFnReturningAPair ( ... ); | for ( ... ) { | ... fst(p) ... snd(p) ... | ... compute_and_alloc ... | } | | So: p becomes a pair. We then enter a loop which refers to | fst(p) and/or snd(p), and in which GC could happen. After cppisation, | the code looks like | | p = complexFnReturningAPair ( ... ); | for ( ... ) { | ... heapTopFst[p] ... heapTopSnd[p] ... | ... compute_and_alloc ... | } | | Expanding the array address calculations gives | | p = complexFnReturningAPair ( ... ); | for ( ... ) { | ... heapTopFst + 4*p ... heapTopSnd + 4*p ... | ... compute_and_alloc ... | } | | Now the critical step, done by gcc when optimising: lift | 4*p out of the loop since it's loop-invariant, and | (incidentally) CSE the two 4*p's together: | | tmp = 4 * complexFnReturningAPair ( ... ); | for ( ... ) { | ... heapTopFst + tmp ... heapTopSnd + tmp ... | ... compute_and_alloc ... | } | | Blargh! Now, inside the loop, there is no mention of the | original p returned by complexFnReturningAPair, neither in | regs nor on the stack. So the pair won't be retained by GC | even though it's still live in the loop. | | The moral of the story is this: when using an | array-index-addressed heap, as Hugs does, we need to treat as | a root not only valid array indexes into the heap, but also | intermediates created by the array address calculations. That is: | | p sat -MAX_HEAP < p <= 0 | p*sizeof(int) sat -MAX_HEAP < p <= 0 | heapTopFst+ p*sizeof(int) sat -MAX_HEAP < p <= 0 | heapTopSnd+ p*sizeof(int) sat -MAX_HEAP < p <= 0 | | Looking in machdep.c:gcCStack(), I only see the first of | those four possibilities handled. Perhaps the following | would be better: | | #define Blargh markWithoutMove(*ptr); \ | markWithoutMove((*ptr)/sizeof(Cell)); \ | markWithoutMove(( | (void*)(*ptr)-(void*)heapTopFst)/sizeof(Cell)); \ | markWithoutMove(( | (void*)(*ptr)-(void*)heapTopSnd)/sizeof(Cell)) | | #define StackGrowsDown { while (ptr<=CStackBase) { Blargh; | ptr++; }; } | #define StackGrowsUp { while (ptr>=CStackBase) { Blargh; | ptr--; }; } | #define GuessDirection if (ptr>CStackBase) StackGrowsUp else | StackGrowsDown | | I tried this on x86 and it made no difference at all, since | there isn't a problem for x86 anyway. I'd be interested to | hear if it makes any difference on Sparcs. (note -- the code | might not be exactly right). Check if you try it ... | | Finally, I think I know why the problem doesn't strike x86s. | That is because the x86 has addressing modes of the form | | (reg1 + k*reg2) for k as 1, 2, 4 or 8 | | So a reference to heapTopFst[p] becomes | | ; reg1 holds p | ; reg2 holds heapTopFst | movl (reg1 + 4*reg2), reg3 | | and there is never any point precomputing 4*p, since the | architecture can do it for free. | | The entire diatribe applies not only to sparcs but other | riscs which have lots of registers but only simple addressing | modes. Have you had problem reports for (eg) MIPS too? | | J |
participants (1)
-
Julian Seward (Intl Vendor)