Hi Simon,
That sounds like a good solution and I'll attempt a patch. I think the fix is only three lines. That is, replace these three lines with EXTERN_INLINE C functions:
#define write_barrier() /* nothing */
#define store_load_barrier() /* nothing */
#define load_load_barrier() /* nothing */
That would fix the -threaded/unthreaded disparity. But I still don't see how to access this stuff properly from foreign-primops in a library such that GHCI doesn't barf when trying to load the library....
-Ryan