Re: [Haskell-cafe] Are handles garbage-collected?

25 Oct 2004

      "Simon Marlow"  writes:
...
At the moment performGC doesn't actually run any finalizers.  It
schedules a thread to run the finalizers, and you hope the thread runs
soon.  So if you're running performGC for the purposes of finalization,
then almost certainly (performGC >> yield) is better.  I've been
wondering whether having a more synchronous kind of finalizer would be a
good thing.
In my language Kogut I have two kinds of finalizers:

- Some objects are implemented in C with C finalizers attached, e.g.
  raw files. These finalizers should not access other objects and they
  can't call back Kogut; they are used to free external resources or
  explicitly managed memory (e.g. bignums, or the payload of arrays
  which is malloced). These finalizers are run during GC.

- You can also attach GHC-style weak references with finalizers to
  arbitrary objects. These finalizers are Kogut functions. They may
  resurrect the object being finalized, and as far as I know they
  can't be used to crash the runtime. They are executed in a separate
  thread spawned by the garbage collector. They may use synchronization
  primitives to wait until global variables are in a consistent state.

  If a finalizer throws an exception, the exception is ignored and
  the finalizing thread proceeds with other objects.

  There is a small problem with their implementation: if a finalizer
  blocks forever, other finalizers from the same GC round will never
  be executed.

This implies that if you forget about a buffered file used for
writing, then the file destriptor will be closed, but buffers will
not be flushed.

Opening a file tries to GC once if it gets an error about too many
open files. It works because files use the first kind of finalizers.

Finalizers of the first kind are run at program end, because it was
easy to do. It doesn't matter anyway for things they are currently
used for, as all of them would be freed by the OS anyway. Finalizers
of the second kind are not run at program end, and I don't know
whether there is a sane way to do that.

You may register actions to be run at program end. I think it's
possible to use them to implement temporary files which are guaranteed
to be deleted at some time: either when they are no longer referenced,
or at program end. Implementing this would require a global set of
weak references to these files, and a single action will delete them
all. It's not possible to unregister these actions, and I'm not
convinced that it's needed in general.

* * *

BTW, I managed to get opening of named fifos working in the presence
of green threads. Opening them in non-blocking mode doesn't work;
GHC documentation describes a workaround for opening them for reading.
It's yet worse for writing (the system just returns ENXIO).

So I open all files in blocking mode, and switch the descriptor to
non-blocking mode after it has been obtained. The timer signal is
ITIMER_REAL (SIGALRM) rather than ITIMER_VIRTUAL (SIGVTALRM), which
causes the timer to tick during waiting for opening a file too. This
causes open() to fail with EINTR, in which case I process signals or
reschedule as appropriate and go back to trying open().

This also allows waiting for chold processes concurrently with waiting
for I/O or timeout. The thread waiting for a process or trying to open
a named fifo wastes its time slice though. I don't think it's possible
to implement this without this problem using Unix API, without native
threads.

While it would be possible to avoid waking it up N times per second if
it's the only thread which can do some work, I'm not sure it would be
worth the effort. The CPU usage is minimal.

Another downside of ITIMER_REAL is that third-party libraries might
not be prepared for EINTR from various system calls. For example my
Kogut<->Python bridge has to disable the timer when entering Python,
and enable it when entering Kogut back - otherwise Python's blocking
I/O will just fail. It still fails when a signal arrives, even if it's
then ignored (because ignoring a signal is done by having a signal
handler which effectively does nothing, rather than raw SIG_IGN).

* * *

These libraries might not be prepared for EAGAIN resulting from
non-blocking mode, in case the same descriptor is used by several
runtimes or across an exec, in particular stdin/stdout/stderr.
This requires explicit programmer intervention - I have no idea how
to do this fully automatically, so I require to do this almost fully
manually to avoid confusion when it's being done by the runtime.

A program tries to restore the original mode when it finishes, instead
of blindly setting them to the blocking mode, so a fork & exec of a
Kogut program from another doesn't disturb the blocking state. OTOH
the programmer must set them to blocking mode manually before exec'ing
a program which is not prepared for non-blocking mode.

Instead of resetting them to non-blocking mode on SIGCONT, as GHC does,
my default handler for SIGTSTP looks like this:

   restore blocking mode on std handles
   set SIGTSTP signal handler to SIG_DFL
   raise(SIGTSTP)   // This causes the process to stop until SIGCONT
   set SIGTSTP signal handler back to my runtime's handler
   set non-blocking mode on std handles

This means that it works on ^Z, and whatever parent process notices
the stop, it will not be confused by non-blocking mode of descriptors.
This is about the only case, apart from program start & exit and
opening a new file, when the blocking mode is changed by the runtime.

It does not work well though when somebody sends a SIGSTOP signal
instead of hitting ^Z. Then the shell starts using the terminal,
possibly sets blocking mode, and the program is continued without
restoring the mode. But since it will not work anyway in other cases
where the descriptor shared with other processes changes its mode,
I'm leaving it this way.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

Re: [Haskell-cafe] Are handles garbage-collected?

Marcin 'Qrczak' Kowalczyk