Re: Thoughts on async RTS API?

15 Dec 2021

      Cheng Shao  writes:
...
Hi devs,
To invoke Haskell computation in C, we need to call one of rts_eval*
functions, which enters the scheduler loop, and returns only when the
specified Haskell thread is finished or killed. We'd like to enhance
the scheduler and add async variants of the rts_eval* functions, which
take C callbacks to consume the Haskell thread result, kick off the
scheduler loop, and the loop is allowed to exit when the Haskell
thread is blocked. Sync variants of RTS API will continue to work with
unchanged behavior.
The main intended use case is async foreign calls for the WebAssembly
target. When an async foreign call is made, the Haskell thread will
block on an MVar to be fulfilled with the call result. But the
scheduler will eventually fail to find work due to empty run queue and
exit with error! We need a way to gracefully exit the scheduler, so
the RTS API caller can process the async foreign call, fulfill that
MVar and resume Haskell computation later.
Question I: does the idea of adding async RTS API sound acceptable by
GHC HQ? To be honest, it's not impossible to workaround lack of async
RTS API: reuse the awaitEvent() logic in non-threaded RTS, pretend
each async foreign call reads from a file descriptor and can be
handled by the POSIX select() function in awaitEvent(). But it'd
surely be nice to avoid such hacks and do things the principled way.
While the idea here sounds reasonable, I'm not sure I quite understand
how this will be used in Asterius's case. Specifically, I would be
worried about the lack of fairness in this scheme: no progress will be
made on any foreign call until all Haskell evaluation has blocked.
Is this really the semantics that you want?
...
Question II: how to modify the scheduler loop to implement this
feature? Straightforward answer seems to be: check some RTS API
non-blocking flag, if present, allow early exit due to empty run
queue.
`schedule` is already a very large function with loops, gotos,
mutability, and quite complex control flow. I would be reluctant
to add to this complexity without first carrying out some
simplification. Instead of adding yet another bail-out case to the loop,
I would probably rather try to extract the loop body into a new
function. That is, currently `schedule` is of the form:

    // Perform work until we are asked to shut down.
    Capability *schedule (Capability *initialCapability, Task *task) {
        Capability *cap = initialCapability;
        while (1) {
            scheduleYield(&cap, task);

            if (emptyRunQueue(cap)) {
                continue;
            }

            if (shutting_down) {
                return cap;
            }

            StgTSO *t = popRunQueue(cap);

            if (! t.can_run_on_capability(cap)) {
                // Push back on the run queue and loop around again to
                // yield the capability to the appropriate task
                pushOnRunQueue(cap, t);
                continue;
            }

            runMutator(t);

            if (needs_gc) {
                scheduleDoGC();
            }
        }
    }

I might rather extract this into something like:

    enum ScheduleResult {
        NoWork,          // There was no work to do
        PerformedWork,   // Ran precisely one thread
        Yield,           // The next thread scheduled to run cannot run on the
                         // given capability; yield.
        ShuttingDown,    // We were asked to shut down
    }

    // Schedule at most one thread once
    ScheduleResult scheduleOnce (Capability **cap, Task *task) {
        if (emptyRunQueue(cap)) {
            return NoWork;
        }

        if (shutting_down) {
            return ShuttingDown;
        }

        StgTSO *t = popRunQueue(cap);

        if (! t.can_run_on_capability(cap)) {
            pushOnRunQueue(cap, t);
            return Yield;
        }

        runMutator(t);

        if (needs_gc) {
            scheduleDoGC();
        }

        return PerformedWork;
    }

This is just a sketch but I hope it's clear that with something like
this this you can easily implement the existing `schedule` function, as
well as your asynchronous variant. 

Cheers,

- Ben

Re: Thoughts on async RTS API?

Ben Gamari