It seems like we could get some priority based scheduling (and still be slackers) if we allow marked green threads to be strictly associated with a specific OS thread (forkChildIO?).
I think you want the GHC-only GHC.Conc.forkOnIO
Suggestions like this are more motivation for the suggestion [1] to adopt a re-engineered / haskell-based RTS [2].