
"Simon Marlow"
At the moment performGC doesn't actually run any finalizers. It schedules a thread to run the finalizers, and you hope the thread runs soon. So if you're running performGC for the purposes of finalization, then almost certainly (performGC >> yield) is better. I've been wondering whether having a more synchronous kind of finalizer would be a good thing.
In my language Kogut I have two kinds of finalizers: - Some objects are implemented in C with C finalizers attached, e.g. raw files. These finalizers should not access other objects and they can't call back Kogut; they are used to free external resources or explicitly managed memory (e.g. bignums, or the payload of arrays which is malloced). These finalizers are run during GC. - You can also attach GHC-style weak references with finalizers to arbitrary objects. These finalizers are Kogut functions. They may resurrect the object being finalized, and as far as I know they can't be used to crash the runtime. They are executed in a separate thread spawned by the garbage collector. They may use synchronization primitives to wait until global variables are in a consistent state. If a finalizer throws an exception, the exception is ignored and the finalizing thread proceeds with other objects. There is a small problem with their implementation: if a finalizer blocks forever, other finalizers from the same GC round will never be executed. This implies that if you forget about a buffered file used for writing, then the file destriptor will be closed, but buffers will not be flushed. Opening a file tries to GC once if it gets an error about too many open files. It works because files use the first kind of finalizers. Finalizers of the first kind are run at program end, because it was easy to do. It doesn't matter anyway for things they are currently used for, as all of them would be freed by the OS anyway. Finalizers of the second kind are not run at program end, and I don't know whether there is a sane way to do that. You may register actions to be run at program end. I think it's possible to use them to implement temporary files which are guaranteed to be deleted at some time: either when they are no longer referenced, or at program end. Implementing this would require a global set of weak references to these files, and a single action will delete them all. It's not possible to unregister these actions, and I'm not convinced that it's needed in general. * * * BTW, I managed to get opening of named fifos working in the presence of green threads. Opening them in non-blocking mode doesn't work; GHC documentation describes a workaround for opening them for reading. It's yet worse for writing (the system just returns ENXIO). So I open all files in blocking mode, and switch the descriptor to non-blocking mode after it has been obtained. The timer signal is ITIMER_REAL (SIGALRM) rather than ITIMER_VIRTUAL (SIGVTALRM), which causes the timer to tick during waiting for opening a file too. This causes open() to fail with EINTR, in which case I process signals or reschedule as appropriate and go back to trying open(). This also allows waiting for chold processes concurrently with waiting for I/O or timeout. The thread waiting for a process or trying to open a named fifo wastes its time slice though. I don't think it's possible to implement this without this problem using Unix API, without native threads. While it would be possible to avoid waking it up N times per second if it's the only thread which can do some work, I'm not sure it would be worth the effort. The CPU usage is minimal. Another downside of ITIMER_REAL is that third-party libraries might not be prepared for EINTR from various system calls. For example my Kogut<->Python bridge has to disable the timer when entering Python, and enable it when entering Kogut back - otherwise Python's blocking I/O will just fail. It still fails when a signal arrives, even if it's then ignored (because ignoring a signal is done by having a signal handler which effectively does nothing, rather than raw SIG_IGN). * * * These libraries might not be prepared for EAGAIN resulting from non-blocking mode, in case the same descriptor is used by several runtimes or across an exec, in particular stdin/stdout/stderr. This requires explicit programmer intervention - I have no idea how to do this fully automatically, so I require to do this almost fully manually to avoid confusion when it's being done by the runtime. A program tries to restore the original mode when it finishes, instead of blindly setting them to the blocking mode, so a fork & exec of a Kogut program from another doesn't disturb the blocking state. OTOH the programmer must set them to blocking mode manually before exec'ing a program which is not prepared for non-blocking mode. Instead of resetting them to non-blocking mode on SIGCONT, as GHC does, my default handler for SIGTSTP looks like this: restore blocking mode on std handles set SIGTSTP signal handler to SIG_DFL raise(SIGTSTP) // This causes the process to stop until SIGCONT set SIGTSTP signal handler back to my runtime's handler set non-blocking mode on std handles This means that it works on ^Z, and whatever parent process notices the stop, it will not be confused by non-blocking mode of descriptors. This is about the only case, apart from program start & exit and opening a new file, when the blocking mode is changed by the runtime. It does not work well though when somebody sends a SIGSTOP signal instead of hitting ^Z. Then the shell starts using the terminal, possibly sets blocking mode, and the program is continued without restoring the mode. But since it will not work anyway in other cases where the descriptor shared with other processes changes its mode, I'm leaving it this way. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/