[Git][ghc/ghc][master] Fix race condition between flushEventLog and start/endEventLogging.
Marge Bot pushed to branch master at Glasgow Haskell Compiler / GHC Commits: 3d6492ce by Wen Kokke at 2026-03-26T03:57:53-04:00 Fix race condition between flushEventLog and start/endEventLogging. This commit changes `flushEventLog` to acquire/release the `state_change` mutex to prevent interleaving with `startEventLogging` and `endEventLogging`. In the current RTS, `flushEventLog` _does not_ acquire this mutex, which may lead to eventlog corruption on the following interleaving: - `startEventLogging` writes the new `EventLogWriter` to `event_log_writer`. - `flushEventLog` flushes some events to `event_log_writer`. - `startEventLogging` writes the eventlog header to `event_log_writer`. This causes the eventlog to be written out in an unreadable state, with one or more events preceding the eventlog header. This commit renames the old function to `flushEventLog_` and defines `flushEventLog` simply as: ```c void flushEventLog(Capability **cap USED_IF_THREADS) { ACQUIRE_LOCK(&state_change_mutex); flushEventLog_(cap); RELEASE_LOCK(&state_change_mutex); } ``` The old function is still needed internally within the compilation unit, where it is used in `endEventLogging` in a context where the `state_change` mutex has already been acquired. I've chosen to mark `flushEventLog_` as static and let other uses of `flushEventLog` within the RTS refer to the new version. There is one use in `hs_init_ghc` via `flushTrace`, where the new locking behaviour should be harmless, and one use in `handle_tick`, which I believe was likely vulnerable to the same race condition, so the new locking behaviour is desirable. I have not added a test. The behaviour is highly non-deterministic and requires a program that concurrently calls `flushEventLog` and `startEventLogging`/`endEventLogging`. I encountered the issue while developing `eventlog-socket` and within that context have verified that my patch likely addresses the issue: a test that used to fail within the first dozen or so runs now has been running on repeat for several hours. - - - - - 1 changed file: - rts/eventlog/EventLog.c Changes: ===================================== rts/eventlog/EventLog.c ===================================== @@ -161,6 +161,8 @@ static void freeEventLoggingBuffer(void); static void ensureRoomForEvent(EventsBuf *eb, EventTypeNum tag); static int ensureRoomForVariableEvent(EventsBuf *eb, StgWord size); +static void flushEventLog_(Capability **cap USED_IF_THREADS); + static inline void postWord8(EventsBuf *eb, StgWord8 i) { *(eb->pos++) = i; @@ -491,7 +493,7 @@ endEventLogging(void) eventlog_enabled = false; - flushEventLog(NULL); + flushEventLog_(NULL); ACQUIRE_LOCK(&eventBufMutex); @@ -1615,6 +1617,17 @@ void flushAllCapsEventsBufs(void) } void flushEventLog(Capability **cap USED_IF_THREADS) +{ + ACQUIRE_LOCK(&state_change_mutex); + flushEventLog_(cap); + RELEASE_LOCK(&state_change_mutex); +} + +// This is an unsafe version of flushEventLog that does not acquire/release the +// state_change mutex. It is for internal use only and should only be used when +// (1) you're sure that there's no chance of racing with start/endEventLogging, +// and (2) there is an event_log_writer. +static void flushEventLog_(Capability **cap USED_IF_THREADS) { if (!event_log_writer) { return; @@ -1644,7 +1657,7 @@ void flushEventLog(Capability **cap USED_IF_THREADS) flushEventLogWriter(); } -#else +#else /*!TRACING*/ enum EventLogStatus eventLogStatus(void) { View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/3d6492ce311611707e80b2594103ddbe... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/3d6492ce311611707e80b2594103ddbe... You're receiving this email because of your account on gitlab.haskell.org.
participants (1)
-
Marge Bot (@marge-bot)