Yes, I'd absolutely rather not suffer C call overhead for these functions (or the CAS functions). But isn't that how it's done currently for the casMutVar# primop?
To avoid the overhead, is it necessary to make each primop in-line rather than out-of-line, or just to get rid of the "ccall"?
Another reason it would be good to package these with GHC is that I'm having trouble building robust libraries of foreign primops that work under all "ways" (e.g. GHCI). For example, this bug:
If I write .cmm code that depends on RTS functionality like stg_MUT_VAR_CLEAN_info, then it seems to work fine when in compiled mode (with/without threading, profiling), but I get link errors from GHCI where these symbols aren't defined.
I've got a draft of the relevant primops here:
Which includes:
- variants of CAS for MutableArray# and MutableByteArray#
- fetch-and-add for MutableByteArray#
Also, there are some tweaks to support the new "ticketed" interface for safer CAS:
I started adding some of these primops to GHC proper (still as out-of-line), but not all of them. I had gone with the foreign primop route instead...
-Ryan
P.S. Where is the write barrier primop? I don't see it listed in prelude/primops.txt...