
Some days ago I sent an email to the list asking about the reason why I couldn't run my programs with the "-Nx" RTS option even if I compiled them with -threaded. AH! by the way, the architecture is ia64 (Itanium). Today I realized that when I ./configure, a preprocessor variable called "NOSMP" is defined and it disallows the multiple OS threads (the -Nx option). Looking a bit deeper I figured out that there's no Itanium version for the functions xchg (exchange), cas (compare-and-swap) and write_barrier in the header file SMP.h (includes/SMP.h) so there's no way that the holy -N option is available. My question is: is it enough to implement xchg, cas and write_barrier for ia64 to make multiple OSthreads available on ia64? If not, what else should be implemented/changed? Regards. Cristian Perfumo

Cristian Perfumo wrote:
Some days ago I sent an email to the list asking about the reason why I couldn't run my programs with the "-Nx" RTS option even if I compiled them with -threaded. AH! by the way, the architecture is ia64 (Itanium). Today I realized that when I ./configure, a preprocessor variable called "NOSMP" is defined and it disallows the multiple OS threads (the -Nx option). Looking a bit deeper I figured out that there's no Itanium version for the functions xchg (exchange), cas (compare-and-swap) and write_barrier in the header file SMP.h (includes/SMP.h) so there's no way that the holy -N option is available. My question is: is it enough to implement xchg, cas and write_barrier for ia64 to make multiple OSthreads available on ia64? If not, what else should be implemented/changed?
Yes, that should be enough. The main concern is that on architectures that don't have strong memory ordering, that the thunk update sequence is safe. See section 3.3: http://www.haskell.org/~simonmar/papers/multiproc.pdf At the moment the update code contains a memory barrier, which compiles to nothing on x86/x86_64 (see SMP.h:write_barrier()). You should check that this doesn't impose a significant performance penalty on ia64: try one of the benchmarks that does a lot of updates (e.g. nofib/imaginary/exp3_8) with and without -threaded. Cheers, Simon

Simon: (first of all: thank you for the information. As soon as we have it
working we will try that application in nofib to see what happens)
We already implemented those synchronization functions for ia64 architecture
and we have a problem related with base register (find output below).
The questions that arose:
1) What is the REG_Base?
2) Do we need it for ia64?
3) In case we need it, which machine register should we use?
Thank you very much in advance for any information.
Cristian Perfumo
------------------------------------------------------------------------
== make way=thr all;
PWD = /.../ghc-6.6.1/rts
------------------------------------------------------------------------
../compiler/ghc-inplace -H16m -O -optc-O2 -static -I. -#include
HCIncludes.h-fvia-C -dcmm-lint -hisuf thr_hi -hcsuf thr_hc -osuf
thr_o
-optc-DTHREADED_RTS -c Apply.cmm -o Apply.thr_o
In file included from /.../ghc- 6.6.1/includes/Stg.h:148,
from /tmp/ghc1736_0/ghc1736_0.thr_hc:3:0:
/scratch/Computacional/adrian/nehir/ghc-6.6.1/includes/Regs.h:353:2:
error: #error BaseReg must be in a register for THREADED_RTS
/tmp/ghc1736_0/ghc1736_0.thr_hc: In function 'stg_AP_entry':
/tmp/ghc1736_0/ghc1736_0.thr_hc:196:0:
error: 'MainCapability' undeclared (first use in this function)
/tmp/ghc1736_0/ghc1736_0.thr_hc:196:0:
error: (Each undeclared identifier is reported only once
/tmp/ghc1736_0/ghc1736_0.thr_hc:196:0:
error: for each function it appears in.)
/tmp/ghc1736_0/ghc1736_0.thr_hc: In function 'stg_AP_STACK_entry':
/tmp/ghc1736_0/ghc1736_0.thr_hc:253:0:
error: 'MainCapability' undeclared (first use in this function)
make[2]: *** [Apply.thr_o] Error 1
make[1]: *** [all] Error 1
make: *** [stage1] Error 1
On 5/9/07, Simon Marlow
Cristian Perfumo wrote:
Some days ago I sent an email to the list asking about the reason why I couldn't run my programs with the "-Nx" RTS option even if I compiled them with -threaded. AH! by the way, the architecture is ia64 (Itanium). Today I realized that when I ./configure, a preprocessor variable called "NOSMP" is defined and it disallows the multiple OS threads (the -Nx option). Looking a bit deeper I figured out that there's no Itanium version for the functions xchg (exchange), cas (compare-and-swap) and write_barrier in the header file SMP.h (includes/SMP.h) so there's no way that the holy -N option is available. My question is: is it enough to implement xchg, cas and write_barrier for ia64 to make multiple OSthreads available on ia64? If not, what else should be implemented/changed?
Yes, that should be enough. The main concern is that on architectures that don't have strong memory ordering, that the thunk update sequence is safe. See section 3.3:
http://www.haskell.org/~simonmar/papers/multiproc.pdfhttp://www.haskell.org/%7Esimonmar/papers/multiproc.pdf
At the moment the update code contains a memory barrier, which compiles to nothing on x86/x86_64 (see SMP.h:write_barrier()). You should check that this doesn't impose a significant performance penalty on ia64: try one of the benchmarks that does a lot of updates (e.g. nofib/imaginary/exp3_8) with and without -threaded.
Cheers, Simon

Cristian Perfumo wrote:
Simon: (first of all: thank you for the information. As soon as we have it working we will try that application in nofib to see what happens) We already implemented those synchronization functions for ia64 architecture and we have a problem related with base register (find output below).
The questions that arose: 1) What is the REG_Base? 2) Do we need it for ia64? 3) In case we need it, which machine register should we use?
REG_Base points to a CPU-local table of information during execution. It has to be in an actual machine register for a multi-CPU build (in a single-CPU build we can get away with using a fixed memory location). The register assignments are in includes/MachRegs.h, and indeed it looks like IA64 doesn't assign a register to Base. I don't know IA64, so I don't know which register I would pick, but usually it's a good idea to pick one that is callee-saves in the C calling convention. Cheers, Simon
participants (2)
-
Cristian Perfumo
-
Simon Marlow