RFC: omit write barrier instructions unless '-threaded'

Hi all,
from what I gathered so far no emission of write barriers is needed when
- running on a uniprocessor (-threaded or not)
- running on a multiprocessor sans having linked with -threaded.
Below patch suppresses the emission of 'lwsync' when no '-threaded' is
specified on PPC only. So it does not cover both criteria above.
It helps me a lot since I have a uniprocessor target that does not
understand the 'lwsync' instruction (instruction is newer than core).
Still, I have some doubts:
o do we want to extend this approach to other archs?
o possibly suppress the emission of MO_WriteBarrier in
compiler/codeGen/StgCmmBind.hs
(more care is needed to also cover compiler/cmm/CmmParse.y).
Anyway this should be a safe first step, and I'd like to push it. When
we find a general solution be can back this commit out.
What do you think?
Cheers,
Gabor
$ git show c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e
commit c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e
Author: Gabor Greif

On 24/02/13 20:40, Gabor Greif wrote:
Hi all,
from what I gathered so far no emission of write barriers is needed when - running on a uniprocessor (-threaded or not) - running on a multiprocessor sans having linked with -threaded.
Below patch suppresses the emission of 'lwsync' when no '-threaded' is specified on PPC only. So it does not cover both criteria above.
It helps me a lot since I have a uniprocessor target that does not understand the 'lwsync' instruction (instruction is newer than core).
Still, I have some doubts: o do we want to extend this approach to other archs? o possibly suppress the emission of MO_WriteBarrier in compiler/codeGen/StgCmmBind.hs (more care is needed to also cover compiler/cmm/CmmParse.y).
Anyway this should be a safe first step, and I'd like to push it. When we find a general solution be can back this commit out.
What do you think?
I don't think this is the right way to do it. The -threaded flag is not supposed to affect code generation, it only changes the RTS that gets linked in. That is, you can compile all your code without -threaded and then just add -threaded at the link step, and it will work. Instead, use conditional compilation in the RTS so that the write_barrer() calls are only present when THREADED_RTS is on. Cheers, Simon
Cheers,
Gabor
$ git show c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e commit c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e Author: Gabor Greif
Date: Fri Feb 22 18:47:00 2013 +0100 Do not emit barriers on PPC unless we go the threaded way
diff --git a/compiler/nativeGen/PPC/CodeGen.hs b/compiler/nativeGen/PPC/CodeGen.hs index 92eff36..6c33cca 100644 --- a/compiler/nativeGen/PPC/CodeGen.hs +++ b/compiler/nativeGen/PPC/CodeGen.hs @@ -1,9 +1,8 @@ - ----------------------------------------------------------------------------- -- -- Generating machine code (instruction selection) -- --- (c) The University of Glasgow 1996-2004 +-- (c) The University of Glasgow 1996-2013 -- -----------------------------------------------------------------------------
@@ -906,8 +905,10 @@ genCCall' -}
-genCCall' _ _ (PrimTarget MO_WriteBarrier) _ _ - = return $ unitOL LWSYNC +genCCall' dflags _ (PrimTarget MO_WriteBarrier) _ _ + = return (if WayThreaded `elem` ways dflags + then unitOL LWSYNC + else nilOL)
genCCall' _ _ (PrimTarget MO_Touch) _ _ = return $ nilOL
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 2/25/13, Simon Marlow
On 24/02/13 20:40, Gabor Greif wrote:
Hi all,
from what I gathered so far no emission of write barriers is needed when - running on a uniprocessor (-threaded or not) - running on a multiprocessor sans having linked with -threaded.
Below patch suppresses the emission of 'lwsync' when no '-threaded' is specified on PPC only. So it does not cover both criteria above.
It helps me a lot since I have a uniprocessor target that does not understand the 'lwsync' instruction (instruction is newer than core).
Still, I have some doubts: o do we want to extend this approach to other archs? o possibly suppress the emission of MO_WriteBarrier in compiler/codeGen/StgCmmBind.hs (more care is needed to also cover compiler/cmm/CmmParse.y).
Anyway this should be a safe first step, and I'd like to push it. When we find a general solution be can back this commit out.
What do you think?
I don't think this is the right way to do it. The -threaded flag is not supposed to affect code generation, it only changes the RTS that gets linked in. That is, you can compile all your code without -threaded and then just add -threaded at the link step, and it will work.
Hi Simon, thanks for your review!
Instead, use conditional compilation in the RTS so that the write_barrer() calls are only present when THREADED_RTS is on.
This is already done the way you suggest: #define write_barrier() /* nothing */ in includes/stg/SMP.h:368 Works perfectly, 'lwsync' only appears in .thr_*.o files. My problem stems from another call: emitPrimCall [] MO_WriteBarrier [] in compiler/codeGen/StgCmmBind.hs:614 or possibly callishMachOps = listToUFM $ map (\(x, y) -> (mkFastString x, y)) [ ( "write_barrier", MO_WriteBarrier ), ( "memcpy", MO_Memcpy ), ( "memset", MO_Memset ), ( "memmove", MO_Memmove ) -- ToDo: the rest, maybe ] in compiler/cmm/CmmParse.y:920 I doubt these files should be compiled with a dependency on THREADED_RTS, and indeed this symbol is only ever defined (-optc-DTHREADED_RTS) for the C compiler, and thus not available when compiling haskell source. So I am still stumped. The function where "lwsync" surfaces is: stg_marked_upd_frame_info which is rts/Updates.cmm:46. This one is creating a blackhole (IISC) by means of StgCmmBind.hs:614. How can we go on from here? Cheers, Gabor
Cheers, Simon
Cheers,
Gabor
$ git show c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e commit c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e Author: Gabor Greif
Date: Fri Feb 22 18:47:00 2013 +0100 Do not emit barriers on PPC unless we go the threaded way
diff --git a/compiler/nativeGen/PPC/CodeGen.hs b/compiler/nativeGen/PPC/CodeGen.hs index 92eff36..6c33cca 100644 --- a/compiler/nativeGen/PPC/CodeGen.hs +++ b/compiler/nativeGen/PPC/CodeGen.hs @@ -1,9 +1,8 @@ -
----------------------------------------------------------------------------- -- -- Generating machine code (instruction selection) -- --- (c) The University of Glasgow 1996-2004 +-- (c) The University of Glasgow 1996-2013 --
-----------------------------------------------------------------------------
@@ -906,8 +905,10 @@ genCCall' -}
-genCCall' _ _ (PrimTarget MO_WriteBarrier) _ _ - = return $ unitOL LWSYNC +genCCall' dflags _ (PrimTarget MO_WriteBarrier) _ _ + = return (if WayThreaded `elem` ways dflags + then unitOL LWSYNC + else nilOL)
genCCall' _ _ (PrimTarget MO_Touch) _ _ = return $ nilOL
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 25/02/13 14:53, Gabor Greif wrote:
On 2/25/13, Simon Marlow
wrote: On 24/02/13 20:40, Gabor Greif wrote:
Hi all,
from what I gathered so far no emission of write barriers is needed when - running on a uniprocessor (-threaded or not) - running on a multiprocessor sans having linked with -threaded.
Below patch suppresses the emission of 'lwsync' when no '-threaded' is specified on PPC only. So it does not cover both criteria above.
It helps me a lot since I have a uniprocessor target that does not understand the 'lwsync' instruction (instruction is newer than core).
Still, I have some doubts: o do we want to extend this approach to other archs? o possibly suppress the emission of MO_WriteBarrier in compiler/codeGen/StgCmmBind.hs (more care is needed to also cover compiler/cmm/CmmParse.y).
Anyway this should be a safe first step, and I'd like to push it. When we find a general solution be can back this commit out.
What do you think?
I don't think this is the right way to do it. The -threaded flag is not supposed to affect code generation, it only changes the RTS that gets linked in. That is, you can compile all your code without -threaded and then just add -threaded at the link step, and it will work.
Hi Simon,
thanks for your review!
Instead, use conditional compilation in the RTS so that the write_barrer() calls are only present when THREADED_RTS is on.
This is already done the way you suggest:
#define write_barrier() /* nothing */
in includes/stg/SMP.h:368
Works perfectly, 'lwsync' only appears in .thr_*.o files.
My problem stems from another call:
emitPrimCall [] MO_WriteBarrier []
in compiler/codeGen/StgCmmBind.hs:614
or possibly
callishMachOps = listToUFM $ map (\(x, y) -> (mkFastString x, y)) [ ( "write_barrier", MO_WriteBarrier ), ( "memcpy", MO_Memcpy ), ( "memset", MO_Memset ), ( "memmove", MO_Memmove ) -- ToDo: the rest, maybe ]
in compiler/cmm/CmmParse.y:920
I doubt these files should be compiled with a dependency on THREADED_RTS, and indeed this symbol is only ever defined (-optc-DTHREADED_RTS) for the C compiler, and thus not available when compiling haskell source.
So I am still stumped.
The function where "lwsync" surfaces is:
stg_marked_upd_frame_info which is rts/Updates.cmm:46.
These come from rts/Updates.h:updateWithIndirection(), which expands to some Cmm code that uses "prim %write_barrier()". So you want to use some conditional compilation to make that prim %write_barrier() disappear when THREADED_RTS is off. I suggest replacing it with write_barrier(), and then #define write_barrier() in includes/Cmm.h. Cheers, Simon
This one is creating a blackhole (IISC) by means of StgCmmBind.hs:614.
How can we go on from here?
Cheers,
Gabor
Cheers, Simon
Cheers,
Gabor
$ git show c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e commit c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e Author: Gabor Greif
Date: Fri Feb 22 18:47:00 2013 +0100 Do not emit barriers on PPC unless we go the threaded way
diff --git a/compiler/nativeGen/PPC/CodeGen.hs b/compiler/nativeGen/PPC/CodeGen.hs index 92eff36..6c33cca 100644 --- a/compiler/nativeGen/PPC/CodeGen.hs +++ b/compiler/nativeGen/PPC/CodeGen.hs @@ -1,9 +1,8 @@ -
----------------------------------------------------------------------------- -- -- Generating machine code (instruction selection) -- --- (c) The University of Glasgow 1996-2004 +-- (c) The University of Glasgow 1996-2013 --
-----------------------------------------------------------------------------
@@ -906,8 +905,10 @@ genCCall' -}
-genCCall' _ _ (PrimTarget MO_WriteBarrier) _ _ - = return $ unitOL LWSYNC +genCCall' dflags _ (PrimTarget MO_WriteBarrier) _ _ + = return (if WayThreaded `elem` ways dflags + then unitOL LWSYNC + else nilOL)
genCCall' _ _ (PrimTarget MO_Touch) _ _ = return $ nilOL
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Thanks, this helped!
Something like this?
$ git show c3f5312
commit c3f53126fb0835cb21fc085e9b7cb5acb1ea1fba
Author: Gabor Greif
On 25/02/13 14:53, Gabor Greif wrote:
On 2/25/13, Simon Marlow
wrote: On 24/02/13 20:40, Gabor Greif wrote:
Hi all,
from what I gathered so far no emission of write barriers is needed when - running on a uniprocessor (-threaded or not) - running on a multiprocessor sans having linked with -threaded.
Below patch suppresses the emission of 'lwsync' when no '-threaded' is specified on PPC only. So it does not cover both criteria above.
It helps me a lot since I have a uniprocessor target that does not understand the 'lwsync' instruction (instruction is newer than core).
Still, I have some doubts: o do we want to extend this approach to other archs? o possibly suppress the emission of MO_WriteBarrier in compiler/codeGen/StgCmmBind.hs (more care is needed to also cover compiler/cmm/CmmParse.y).
Anyway this should be a safe first step, and I'd like to push it. When we find a general solution be can back this commit out.
What do you think?
I don't think this is the right way to do it. The -threaded flag is not supposed to affect code generation, it only changes the RTS that gets linked in. That is, you can compile all your code without -threaded and then just add -threaded at the link step, and it will work.
Hi Simon,
thanks for your review!
Instead, use conditional compilation in the RTS so that the write_barrer() calls are only present when THREADED_RTS is on.
This is already done the way you suggest:
#define write_barrier() /* nothing */
in includes/stg/SMP.h:368
Works perfectly, 'lwsync' only appears in .thr_*.o files.
My problem stems from another call:
emitPrimCall [] MO_WriteBarrier []
in compiler/codeGen/StgCmmBind.hs:614
or possibly
callishMachOps = listToUFM $ map (\(x, y) -> (mkFastString x, y)) [ ( "write_barrier", MO_WriteBarrier ), ( "memcpy", MO_Memcpy ), ( "memset", MO_Memset ), ( "memmove", MO_Memmove ) -- ToDo: the rest, maybe ]
in compiler/cmm/CmmParse.y:920
I doubt these files should be compiled with a dependency on THREADED_RTS, and indeed this symbol is only ever defined (-optc-DTHREADED_RTS) for the C compiler, and thus not available when compiling haskell source.
So I am still stumped.
The function where "lwsync" surfaces is:
stg_marked_upd_frame_info which is rts/Updates.cmm:46.
These come from rts/Updates.h:updateWithIndirection(), which expands to some Cmm code that uses "prim %write_barrier()". So you want to use some conditional compilation to make that prim %write_barrier() disappear when THREADED_RTS is off. I suggest replacing it with write_barrier(), and then #define write_barrier() in includes/Cmm.h.
Cheers, Simon
This one is creating a blackhole (IISC) by means of StgCmmBind.hs:614.
How can we go on from here?
Cheers,
Gabor
Cheers, Simon
Cheers,
Gabor
$ git show c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e commit c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e Author: Gabor Greif
Date: Fri Feb 22 18:47:00 2013 +0100 Do not emit barriers on PPC unless we go the threaded way
diff --git a/compiler/nativeGen/PPC/CodeGen.hs b/compiler/nativeGen/PPC/CodeGen.hs index 92eff36..6c33cca 100644 --- a/compiler/nativeGen/PPC/CodeGen.hs +++ b/compiler/nativeGen/PPC/CodeGen.hs @@ -1,9 +1,8 @@ -
----------------------------------------------------------------------------- -- -- Generating machine code (instruction selection) -- --- (c) The University of Glasgow 1996-2004 +-- (c) The University of Glasgow 1996-2013 --
-----------------------------------------------------------------------------
@@ -906,8 +905,10 @@ genCCall' -}
-genCCall' _ _ (PrimTarget MO_WriteBarrier) _ _ - = return $ unitOL LWSYNC +genCCall' dflags _ (PrimTarget MO_WriteBarrier) _ _ + = return (if WayThreaded `elem` ways dflags + then unitOL LWSYNC + else nilOL)
genCCall' _ _ (PrimTarget MO_Touch) _ _ = return $ nilOL
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 25/02/13 22:07, Gabor Greif wrote:
Thanks, this helped!
Something like this?
$ git show c3f5312 commit c3f53126fb0835cb21fc085e9b7cb5acb1ea1fba Author: Gabor Greif
Date: Mon Feb 25 23:03:23 2013 +0100 Only emit %write_barrier primitive for THREADED_RTS
diff --git a/includes/Cmm.h b/includes/Cmm.h index 1505b1c..bc5702c 100644 --- a/includes/Cmm.h +++ b/includes/Cmm.h @@ -583,6 +583,12 @@ #define OVERWRITING_CLOSURE(c) /* nothing */ #endif
+#ifdef THREADED_RTS +#define prim_write_barrier prim %write_barrier() +#else +#define prim_write_barrier /* nothing */ +#endif + /* ----------------------------------------------------------------------------- Ticky macros -------------------------------------------------------------------------- */ diff --git a/rts/Updates.h b/rts/Updates.h index b4ff7d1..e6a2eb9 100644 --- a/rts/Updates.h +++ b/rts/Updates.h @@ -46,7 +46,7 @@ \ OVERWRITING_CLOSURE(p1); \ StgInd_indirectee(p1) = p2; \ - prim %write_barrier(); \ + prim_write_barrier; \ SET_INFO(p1, stg_BLACKHOLE_info); \ LDV_RECORD_CREATE(p1); \ bd = Bdescr(p1); \
Okay to commit?
Yes, ok. Cheers, Simon
Cheers,
Gabor
On 2/25/13, Simon Marlow
wrote: On 25/02/13 14:53, Gabor Greif wrote:
On 2/25/13, Simon Marlow
wrote: On 24/02/13 20:40, Gabor Greif wrote:
Hi all,
from what I gathered so far no emission of write barriers is needed when - running on a uniprocessor (-threaded or not) - running on a multiprocessor sans having linked with -threaded.
Below patch suppresses the emission of 'lwsync' when no '-threaded' is specified on PPC only. So it does not cover both criteria above.
It helps me a lot since I have a uniprocessor target that does not understand the 'lwsync' instruction (instruction is newer than core).
Still, I have some doubts: o do we want to extend this approach to other archs? o possibly suppress the emission of MO_WriteBarrier in compiler/codeGen/StgCmmBind.hs (more care is needed to also cover compiler/cmm/CmmParse.y).
Anyway this should be a safe first step, and I'd like to push it. When we find a general solution be can back this commit out.
What do you think?
I don't think this is the right way to do it. The -threaded flag is not supposed to affect code generation, it only changes the RTS that gets linked in. That is, you can compile all your code without -threaded and then just add -threaded at the link step, and it will work.
Hi Simon,
thanks for your review!
Instead, use conditional compilation in the RTS so that the write_barrer() calls are only present when THREADED_RTS is on.
This is already done the way you suggest:
#define write_barrier() /* nothing */
in includes/stg/SMP.h:368
Works perfectly, 'lwsync' only appears in .thr_*.o files.
My problem stems from another call:
emitPrimCall [] MO_WriteBarrier []
in compiler/codeGen/StgCmmBind.hs:614
or possibly
callishMachOps = listToUFM $ map (\(x, y) -> (mkFastString x, y)) [ ( "write_barrier", MO_WriteBarrier ), ( "memcpy", MO_Memcpy ), ( "memset", MO_Memset ), ( "memmove", MO_Memmove ) -- ToDo: the rest, maybe ]
in compiler/cmm/CmmParse.y:920
I doubt these files should be compiled with a dependency on THREADED_RTS, and indeed this symbol is only ever defined (-optc-DTHREADED_RTS) for the C compiler, and thus not available when compiling haskell source.
So I am still stumped.
The function where "lwsync" surfaces is:
stg_marked_upd_frame_info which is rts/Updates.cmm:46.
These come from rts/Updates.h:updateWithIndirection(), which expands to some Cmm code that uses "prim %write_barrier()". So you want to use some conditional compilation to make that prim %write_barrier() disappear when THREADED_RTS is off. I suggest replacing it with write_barrier(), and then #define write_barrier() in includes/Cmm.h.
Cheers, Simon
This one is creating a blackhole (IISC) by means of StgCmmBind.hs:614.
How can we go on from here?
Cheers,
Gabor
Cheers, Simon
Cheers,
Gabor
$ git show c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e commit c0d682fb98f32e4ce5d27ff3a30f43b6cd70733e Author: Gabor Greif
Date: Fri Feb 22 18:47:00 2013 +0100 Do not emit barriers on PPC unless we go the threaded way
diff --git a/compiler/nativeGen/PPC/CodeGen.hs b/compiler/nativeGen/PPC/CodeGen.hs index 92eff36..6c33cca 100644 --- a/compiler/nativeGen/PPC/CodeGen.hs +++ b/compiler/nativeGen/PPC/CodeGen.hs @@ -1,9 +1,8 @@ -
----------------------------------------------------------------------------- -- -- Generating machine code (instruction selection) -- --- (c) The University of Glasgow 1996-2004 +-- (c) The University of Glasgow 1996-2013 --
-----------------------------------------------------------------------------
@@ -906,8 +905,10 @@ genCCall' -}
-genCCall' _ _ (PrimTarget MO_WriteBarrier) _ _ - = return $ unitOL LWSYNC +genCCall' dflags _ (PrimTarget MO_WriteBarrier) _ _ + = return (if WayThreaded `elem` ways dflags + then unitOL LWSYNC + else nilOL)
genCCall' _ _ (PrimTarget MO_Touch) _ _ = return $ nilOL
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On Sun, Feb 24, 2013 at 3:40 PM, Gabor Greif
Hi all,
from what I gathered so far no emission of write barriers is needed when - running on a uniprocessor (-threaded or not) - running on a multiprocessor sans having linked with -threaded.
On Windows, the non-threaded RTS does I/O with forked threads; see rts/win32/IOManager.c and the async* functions in GHC.Conc.Windows. I think this means that for Windows, we need those write barriers even for non-threaded. -Joey

On 2/27/13, Joey Adams
On Sun, Feb 24, 2013 at 3:40 PM, Gabor Greif
wrote: Hi all,
from what I gathered so far no emission of write barriers is needed when - running on a uniprocessor (-threaded or not) - running on a multiprocessor sans having linked with -threaded.
On Windows, the non-threaded RTS does I/O with forked threads; see rts/win32/IOManager.c and the async* functions in GHC.Conc.Windows. I think this means that for Windows, we need those write barriers even for non-threaded.
And when those async threads return control to the RTS don't they implicitly establish a write barrier? After all they are very syscall-like, or I am mistaken? Surely someone at Microsoft can find out :-) Cheers, Gabor
-Joey

On 27/02/13 16:17, Joey Adams wrote:
On Sun, Feb 24, 2013 at 3:40 PM, Gabor Greif
mailto:ggreif@gmail.com> wrote: Hi all,
from what I gathered so far no emission of write barriers is needed when - running on a uniprocessor (-threaded or not) - running on a multiprocessor sans having linked with -threaded.
On Windows, the non-threaded RTS does I/O with forked threads; see rts/win32/IOManager.c and the async* functions in GHC.Conc.Windows. I think this means that for Windows, we need those write barriers even for non-threaded.
Windows does use multiple OS threads to do non-blocking I/O in the non-threaded RTS. However, it doesn't run multiple Haskell threads in parallel (the threaded RTS is needed for that), so Gabor's patch which just affects the update code is fine. If we did actually rely on the C write_barrier() macro for the Windows I/O manager, then there could be a problem, but we don't. On the other hand, it might be prudent to enable write_barrier() on Windows just in case someone tries to use it in the non-threaded RTS in the future. Cheers, Simon
participants (3)
-
Gabor Greif
-
Joey Adams
-
Simon Marlow