
Hello All, I currently try to get a ghc port on mips-linux going. I understand Igloo does the same ATM, and things look promising so far. However, the port is currently unregisterised, and I would like to improve it a bit. A registerised port seems to be achievable with a moderate amount of work. I looked a bit around in the code, and have now a few questions: - The example of other ports suggests the useful maximum of general purpose registers for GHC is 8. I also presume that unmentioned registers aren't touched by haskell code. Is this correct? - The comments in the source suggest that callee-saved registers are preferable, without further explanation. I would expect a mix of caller- and callee-saved registers to be potentially better. Any advice on this? - The mips ABI defines 8 (or 9 when including the frame pointer) registers as callee-saved, and more than 9 caller-saved temporaries. With four registers taken for stack/heap pointers this leaves a 5/3 split of callee-saved/caller-saved registers, if all my assumptions above are ok. Are there other considerations to take into account for the register layout? Thiemo

Just in case anyone was wondering why you might want GHC to work well on MIPS... http://www.movidis.com/products/rev_spec.asp A 16x core 600Mhz low-power MIPS machine pre-installed with Debian. With the new smp-capable ghc, such a box might be rather good for some Haskell server application. It's a similar price to Sun's 4x core (with 4 threads per core) 1Ghz low power sparc machines (which run Linux or Solaris). On Mon, 2006-08-21 at 15:18 +0100, Thiemo Seufer wrote:
Hello All,
I currently try to get a ghc port on mips-linux going. I understand Igloo does the same ATM, and things look promising so far.
However, the port is currently unregisterised, and I would like to improve it a bit. A registerised port seems to be achievable with a moderate amount of work. I looked a bit around in the code, and have now a few questions:
- The example of other ports suggests the useful maximum of general purpose registers for GHC is 8. I also presume that unmentioned registers aren't touched by haskell code. Is this correct?
- The comments in the source suggest that callee-saved registers are preferable, without further explanation. I would expect a mix of caller- and callee-saved registers to be potentially better. Any advice on this?
- The mips ABI defines 8 (or 9 when including the frame pointer) registers as callee-saved, and more than 9 caller-saved temporaries. With four registers taken for stack/heap pointers this leaves a 5/3 split of callee-saved/caller-saved registers, if all my assumptions above are ok. Are there other considerations to take into account for the register layout?
Thiemo _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Thiemo Seufer wrote:
Hello All,
I currently try to get a ghc port on mips-linux going. I understand Igloo does the same ATM, and things look promising so far.
However, the port is currently unregisterised, and I would like to improve it a bit. A registerised port seems to be achievable with a moderate amount of work. I looked a bit around in the code, and have now a few questions:
- The example of other ports suggests the useful maximum of general purpose registers for GHC is 8. I also presume that unmentioned registers aren't touched by haskell code. Is this correct?
In MachRegs.h you specify registers that are globally reserved for use by Haskell code, such as the stack or heap pointer. Other registers are still available for use as normal by gcc or the native code generator. The R1-R8 registers are for argument-passing and returning unboxed tuples. R1 is by far the most important, you *must* put R1 in a register to get any kind of reasonable performance. It's a bit sad that we have to globally reserve registers for argument passing, but that's the way it is right now. The native code generator could actually make much better use of these registers than it currently does.
- The comments in the source suggest that callee-saved registers are preferable, without further explanation. I would expect a mix of caller- and callee-saved registers to be potentially better. Any advice on this?
The only reason that callee-saves registers are better is because they don't have to be saved and restored around an FFI call. If you're going to use caller-saves registers, then use them for R2-R8, because those registers don't always have to be saved, since we know when they contain live data. For example, if you put Hp or Sp in a caller-saves register, these would need to be saved around *every* FFI call, since they are always live.
- The mips ABI defines 8 (or 9 when including the frame pointer) registers as callee-saved, and more than 9 caller-saved temporaries. With four registers taken for stack/heap pointers this leaves a 5/3 split of callee-saved/caller-saved registers, if all my assumptions above are ok. Are there other considerations to take into account for the register layout?
With 5/3, I suggest putting R1-R4 in callee-saves and R5-R8 in caller-saves. what about floating point regs? Cheers, Simon

Thiemo Seufer wrote:
Hello All,
I currently try to get a ghc port on mips-linux going. I understand Igloo does the same ATM, and things look promising so far.
However, the port is currently unregisterised, and I would like to improve it a bit. A registerised port seems to be achievable with a moderate amount of work. I looked a bit around in the code, and have now a few questions:
- The example of other ports suggests the useful maximum of general purpose registers for GHC is 8. I also presume that unmentioned registers aren't touched by haskell code. Is this correct?
- The comments in the source suggest that callee-saved registers are preferable, without further explanation. I would expect a mix of caller- and callee-saved registers to be potentially better. Any advice on this?
- The mips ABI defines 8 (or 9 when including the frame pointer) registers as callee-saved, and more than 9 caller-saved temporaries. With four registers taken for stack/heap pointers this leaves a 5/3 split of callee-saved/caller-saved registers, if all my assumptions above are ok. Are there other considerations to take into account for the register layout?
I decided to ignore the performance tuning for now and used a register mapping which is compatible to all three relevant MIPS ABIs. It uses 4 callee-saved registers as R1-R4, plus 4 temporaries (caller-saved) registers as R5-R8. With the appended patch to support a registerised build of GHC 6.4.2 on Debian mips-linux I got those testsuite results: OVERALL SUMMARY for test run started at Wed Aug 23 20:12:05 BST 2006 674 total tests, which gave rise to 1883 test cases, of which 21 caused framework failures 369 were skipped 1407 expected passes 22 expected failures 0 unexpected passes 41 unexpected failures Unexpected failures: barton-mangler-bug(normal,opt,prof,threaded) cabal01(normal) cg005(prof) char001(prof) directory001(prof) driver062.2(normal) drvrun006(normal) drvrun018(opt) enum02(threaded) exceptions001(normal) ext1(prof) fed001(normal,opt,prof,threaded) ffi006(normal,opt,prof,threaded) ffi007(normal,opt,prof,threaded) ffi008(normal) finalization001(threaded) freeNames(threaded) galois_raytrace(opt,prof) ioref001(normal,prof,threaded) joao-circular(normal,opt,prof,threaded) ratio001(prof) uri001(prof) xmlish(prof) Not all that bad for a first attempt. Some of the failures are probably due to the lack of MIPS support in ghc/rts/Adjustor.c, AFAIU this means C -> haskell calls don't work for now. I made ghc6 .deb packages from the unmodified Debian source, another set of packages built with the appended patch, plus a haddock package which is needed as a build-dependency. All are available from http://people.debian.org/~ths/ghc6/ and http://people.debian.org/~ths/haddock/ There are no little endian mipsel packages yet. So if you happen to have a Debian mips installation around and want to try ghc6, feel free. :-) Thiemo diff -urpN restrap2/ghc6-6.4.2/ghc/includes/MachRegs.h hackage/ghc6-6.4.2/ghc/includes/MachRegs.h --- restrap2/ghc6-6.4.2/ghc/includes/MachRegs.h 2005-07-13 09:47:05.000000000 +0100 +++ hackage/ghc6-6.4.2/ghc/includes/MachRegs.h 2006-08-22 21:42:31.000000000 +0100 @@ -399,10 +399,6 @@ #define REG(x) __asm__("$" #x) -#define CALLER_SAVES_R1 -#define CALLER_SAVES_R2 -#define CALLER_SAVES_R3 -#define CALLER_SAVES_R4 #define CALLER_SAVES_R5 #define CALLER_SAVES_R6 #define CALLER_SAVES_R7 @@ -410,14 +406,14 @@ #define CALLER_SAVES_USER -#define REG_R1 9 -#define REG_R2 10 -#define REG_R3 11 -#define REG_R4 12 -#define REG_R5 13 -#define REG_R6 14 -#define REG_R7 15 -#define REG_R8 24 +#define REG_R1 16 +#define REG_R2 17 +#define REG_R3 18 +#define REG_R4 19 +#define REG_R5 12 +#define REG_R6 13 +#define REG_R7 14 +#define REG_R8 15 #define REG_F1 f20 #define REG_F2 f22 @@ -427,11 +423,13 @@ #define REG_D1 f28 #define REG_D2 f30 -#define REG_Sp 16 -#define REG_SpLim 18 +#define REG_Sp 20 +#define REG_SpLim 21 -#define REG_Hp 19 -#define REG_HpLim 20 +#define REG_Hp 22 +#define REG_HpLim 23 + +#define REG_Base 30 #endif /* mipse[lb] */ diff -urpN restrap2/ghc6-6.4.2/ghc/includes/TailCalls.h hackage/ghc6-6.4.2/ghc/includes/TailCalls.h --- restrap2/ghc6-6.4.2/ghc/includes/TailCalls.h 2005-03-08 09:38:57.000000000 +0000 +++ hackage/ghc6-6.4.2/ghc/includes/TailCalls.h 2006-08-22 20:45:52.000000000 +0100 @@ -245,6 +245,29 @@ but uses $$dyncall if necessary to cope, #endif /* ----------------------------------------------------------------------------- + Tail calling on MIPS + -------------------------------------------------------------------------- */ + +#ifdef mips_HOST_ARCH + +#if IN_STG_CODE +register void *_procedure __asm__("$25"); +#endif + +#define JMP_(cont) \ + { \ + _procedure = (void *)(cont); \ + __DISCARD__(); \ + goto *_procedure; \ + } + +/* Don't need these for MIPS mangling */ +#define FB_ +#define FE_ + +#endif /* mips_HOST_ARCH */ + +/* ----------------------------------------------------------------------------- FUNBEGIN and FUNEND. These are markers indicating the start and end of Real Code in a diff -urpN restrap2/ghc6-6.4.2/ghc/rts/StgCRun.c hackage/ghc6-6.4.2/ghc/rts/StgCRun.c --- restrap2/ghc6-6.4.2/ghc/rts/StgCRun.c 2006-03-22 10:29:51.000000000 +0000 +++ hackage/ghc6-6.4.2/ghc/rts/StgCRun.c 2006-08-22 21:54:33.000000000 +0100 @@ -877,4 +877,37 @@ StgRunIsImplementedInAssembler(void) #endif +/* ----------------------------------------------------------------------------- + MIPS architecture + -------------------------------------------------------------------------- */ + +#ifdef mips_HOST_ARCH + +StgThreadReturnCode +StgRun(StgFunPtr f, StgRegTable *basereg) +{ + register StgThreadReturnCode __v0 __asm__("$2"); + + __asm__ __volatile__( + " la $25, %1 \n" + " move $30, %2 \n" + " jr %1 \n" + " .align 3 \n" + " .globl " STG_RETURN " \n" + " .aent " STG_RETURN " \n" + STG_RETURN ": \n" + " move %0, $16 \n" + " move $3, $17 \n" + : "=r" (__v0), + : "r" (f), "r" (basereg) + "$16", "$17", "$18", "$19", "$20", "$21", "$22", "$23", + "$25", "$28", "$30", + "$f20", "$f22", "$f24", "$f26", "$f28", "$f30", + "memory"); + + return __v0; +} + +#endif /* mips_HOST_ARCH */ + #endif /* !USE_MINIINTERPRETER */

Thiemo Seufer wrote:
I decided to ignore the performance tuning for now and used a register mapping which is compatible to all three relevant MIPS ABIs. It uses 4 callee-saved registers as R1-R4, plus 4 temporaries (caller-saved) registers as R5-R8.
With the appended patch to support a registerised build of GHC 6.4.2 on Debian mips-linux I got those testsuite results:
OVERALL SUMMARY for test run started at Wed Aug 23 20:12:05 BST 2006 674 total tests, which gave rise to 1883 test cases, of which 21 caused framework failures 369 were skipped
1407 expected passes 22 expected failures 0 unexpected passes 41 unexpected failures
Unexpected failures: barton-mangler-bug(normal,opt,prof,threaded) cabal01(normal) cg005(prof) char001(prof) directory001(prof) driver062.2(normal) drvrun006(normal) drvrun018(opt) enum02(threaded) exceptions001(normal) ext1(prof) fed001(normal,opt,prof,threaded) ffi006(normal,opt,prof,threaded) ffi007(normal,opt,prof,threaded) ffi008(normal) finalization001(threaded) freeNames(threaded) galois_raytrace(opt,prof) ioref001(normal,prof,threaded) joao-circular(normal,opt,prof,threaded) ratio001(prof) uri001(prof) xmlish(prof)
Congratulations, that does look pretty good. Leaving out barton-mangler-bug (probably floating-point rounding differences) and the ffi tests (lack of Adjustor.c support), there aren't many failures. Although there don't seem to be any pattern to the remaining failures, which is perhaps slightly worrying - were they all segfaults, and are they repeatable? I've committed your patch, thanks. Cheers, Simon

Simon Marlow wrote:
Thiemo Seufer wrote:
I decided to ignore the performance tuning for now and used a register mapping which is compatible to all three relevant MIPS ABIs. It uses 4 callee-saved registers as R1-R4, plus 4 temporaries (caller-saved) registers as R5-R8.
With the appended patch to support a registerised build of GHC 6.4.2 on Debian mips-linux I got those testsuite results:
OVERALL SUMMARY for test run started at Wed Aug 23 20:12:05 BST 2006 674 total tests, which gave rise to 1883 test cases, of which 21 caused framework failures 369 were skipped
1407 expected passes 22 expected failures 0 unexpected passes 41 unexpected failures
Unexpected failures: barton-mangler-bug(normal,opt,prof,threaded) cabal01(normal) cg005(prof) char001(prof) directory001(prof) driver062.2(normal) drvrun006(normal) drvrun018(opt) enum02(threaded) exceptions001(normal) ext1(prof) fed001(normal,opt,prof,threaded) ffi006(normal,opt,prof,threaded) ffi007(normal,opt,prof,threaded) ffi008(normal) finalization001(threaded) freeNames(threaded) galois_raytrace(opt,prof) ioref001(normal,prof,threaded) joao-circular(normal,opt,prof,threaded) ratio001(prof) uri001(prof) xmlish(prof)
Congratulations, that does look pretty good. Leaving out barton-mangler-bug (probably floating-point rounding differences) and the ffi tests (lack of Adjustor.c support), there aren't many failures. Although there don't seem to be any pattern to the remaining failures, which is perhaps slightly worrying - were they all segfaults, and are they repeatable?
Hm, I think I don't understand how the testsuite works. There seems to be no record of the test results, and I missed to log the output. I looked at the driver062.2 failure, and the binary seems to work fine, except that there's no matching stdout file in the testsuite tarball. I figure it fails on all platforms that way. Also, I'm more worried about the 21 test framework failures. How can I find out what happened there?
I've committed your patch, thanks.
The assembler call wrapper in StgCRun has a typo which hides some bugs (a missing colon before the clobber list). This probably explains some failures, I'm doing a rebuild and a testsuite rerun with a fixed version of the wrapper. Thiemo

Thiemo Seufer wrote:
Simon Marlow wrote:
Thiemo Seufer wrote:
I decided to ignore the performance tuning for now and used a register mapping which is compatible to all three relevant MIPS ABIs. It uses 4 callee-saved registers as R1-R4, plus 4 temporaries (caller-saved) registers as R5-R8.
With the appended patch to support a registerised build of GHC 6.4.2 on Debian mips-linux I got those testsuite results:
OVERALL SUMMARY for test run started at Wed Aug 23 20:12:05 BST 2006 674 total tests, which gave rise to 1883 test cases, of which 21 caused framework failures 369 were skipped
1407 expected passes 22 expected failures 0 unexpected passes 41 unexpected failures
Unexpected failures: barton-mangler-bug(normal,opt,prof,threaded) cabal01(normal) cg005(prof) char001(prof) directory001(prof) driver062.2(normal) drvrun006(normal) drvrun018(opt) enum02(threaded) exceptions001(normal) ext1(prof) fed001(normal,opt,prof,threaded) ffi006(normal,opt,prof,threaded) ffi007(normal,opt,prof,threaded) ffi008(normal) finalization001(threaded) freeNames(threaded) galois_raytrace(opt,prof) ioref001(normal,prof,threaded) joao-circular(normal,opt,prof,threaded) ratio001(prof) uri001(prof) xmlish(prof)
Congratulations, that does look pretty good. Leaving out barton-mangler-bug (probably floating-point rounding differences) and the ffi tests (lack of Adjustor.c support), there aren't many failures. Although there don't seem to be any pattern to the remaining failures, which is perhaps slightly worrying - were they all segfaults, and are they repeatable?
Hm, I think I don't understand how the testsuite works. There seems to be no record of the test results, and I missed to log the output.
Yes, perhaps the test driver should automatically log the output. Normally I use tee. I noticed something else strange about your test run: you only have 674 tests, but the testsuite should have nearly 1400. Maybe those framework failures are related to this.
I looked at the driver062.2 failure, and the binary seems to work fine, except that there's no matching stdout file in the testsuite tarball. I figure it fails on all platforms that way.
Also, I'm more worried about the 21 test framework failures. How can I find out what happened there?
Looking at the log is the only way, I'm afraid.
I've committed your patch, thanks.
The assembler call wrapper in StgCRun has a typo which hides some bugs (a missing colon before the clobber list). This probably explains some failures, I'm doing a rebuild and a testsuite rerun with a fixed version of the wrapper.
So there should be a colon before "$16", right? Cheers, Simon

Simon Marlow wrote: [snip]
The assembler call wrapper in StgCRun has a typo which hides some bugs (a missing colon before the clobber list). This probably explains some failures, I'm doing a rebuild and a testsuite rerun with a fixed version of the wrapper.
So there should be a colon before "$16", right?
Which then ends in a build failure because of overzealous register clobbering. I'll send a patch once I've verified it. Thiemo
participants (3)
-
Duncan Coutts
-
Simon Marlow
-
Thiemo Seufer