[Git][ghc/ghc][wip/dcoutts/issue-27105-stopTicker-2] 10 commits: hadrian: Remove old package.conf files when generating new ones
Duncan Coutts pushed to branch wip/dcoutts/issue-27105-stopTicker-2 at Glasgow Haskell Compiler / GHC Commits: 5ac9ce7d by Zubin Duggal at 2026-06-10T21:26:32+05:30 hadrian: Remove old package.conf files when generating new ones Old package.conf files might exists with different hashes, causing issues like #26661 Fixes #26661 - - - - - c9015f09 by sheaf at 2026-06-11T12:40:28-04:00 Fix AArch64 clobbering bug for MUL2 On AArch64, the code generator could clobber one of the input operands when computing the lower bits of a MUL2 operation. This rendered invalid the subsequent computation of the high bits. This commit fixes that by using a temporary register. The register allocator can remove the redundant move in the common case when the registers do not conflict. Fixes #27046 - - - - - 7ab90288 by Rodrigo Mesquita at 2026-06-11T12:41:11-04:00 fix: make T27131 less flaky It seems that T27131 fails flakily in a race where we check the flag before the capability had the chance to process the mailbox which sets the flag. This seemingly should only happen if the capability ends up being the same for setting and checking the flag. - - - - - de196632 by Duncan Coutts at 2026-06-11T22:55:19+01:00 Promote HAVE_PREEMPTION from Timer.c to OSThreads.h We will want to know about HAVE_PREEMPTION in more places. HAVE_PREEMPTION tells us that we do have OS threads available, irrespective of whether THREADED is defined. In particular, HAVE_PREEMPTION is defined on all proper OSs, but not on WASM (and hyopthetically may not be true on some other platforms like micro-controllers, RTOSs, VM hypervisors etc). - - - - - 9cc10562 by Duncan Coutts at 2026-06-11T22:55:19+01:00 Define ACQUIRE_LOCK_ALWAYS and friends Fix issue #27335 Like the atomic _ALWAYS variants, these lock actions are always defined, rather than being dependent on whether we are in the THREADED case. All the "normal" LOCK macros are defined to be no-ops when !THREADED. The use case for the _ALWAYS variants is where we are using OS threads even in the non-threaded RTS. This includes everything to do with the timer/ticker thread, which is used in the non-threaded RTS too. In particular, we will want to use this for eventlog things, because the timer thread performs eventlogging concurrently with the main capability, even in the non-threaded RTS. - - - - - 567ac8e7 by Duncan Coutts at 2026-06-11T22:55:19+01:00 Use ACQUIRE/RELEASE_LOCK_ALWAYS with eventBufMutex Even in the non-threaded RTS the eventBufMutex is needed by both the main capability and the timer/ticker thread, so always use the mutex. This should fix #25165 which is about the main capability and the timer thread posting events to the eventlog buffer concurrently and thereby corrupting the buffer data. - - - - - f59329b3 by Duncan Coutts at 2026-06-11T22:55:19+01:00 Expose eventBufMutex in the EventLog interface/header We will need it in forkProcess to ensure we don't write to the global eventlog buffer concurrently with trying to flush eventlog buffers and do the fork(). - - - - - 0f7eb782 by Duncan Coutts at 2026-06-11T22:55:19+01:00 Split flushAllCapsEventsBufs into safe and unlocked version Following the convention that unlocked versions have a trailing _ underscore in their name. This one requires the caller to hold the eventlog global buffer mutex. We will need this in forkProcess. - - - - - c3b93527 by Duncan Coutts at 2026-06-11T22:55:19+01:00 Remove redundant use of stopTimer in setNumCapabilities Historically, the comment here was: We must stop the interval timer while we are changing the capabilities array lest handle_tick may try to context switch an old capability. See #17289. and We must disable the timer while we do this since the tick handler may call contextSwitchAllCapabilities, which may see the capabilities array as we free it. What this refers to is that historically, when changing the number of capabilities, the array of capabilities was reallocated to a new size, allocating new ones and freeing the old ones, thus invalidating all existing capbility pointers. Strangely, for good measure the code used to call stopTimer twice (hence the two similar comments above). However, since commit a3eccf06292dd666b24606251a52da2b466a9612, the capabilities array is no longer reallocated. Instead the array is allcoated once on RTS startup to the maximum size it could ever be allowed to be, and then capabilities get enabled/disabled at runtime. So the capability pointers never become invalid anymore. At worst, they may point to capabilities that are disabled. Thus we no longer need to stop the timer (twice) while we change the number of enabled capabilities. This also partially solves issue #27105, which notes that stopTimer is being used as if it were synchronous, when it is not. At least for this case, the solution is that stopTimer is not needed at all! - - - - - 86cfb7b6 by Duncan Coutts at 2026-06-11T22:55:19+01:00 Remove redundant use of stopTimer in forkProcess but replace it with taking the eventlog buffer lock during the fork. Fixes issue #27105 The original reason to block the timer during a fork was that historically the timer was implemented using a periodic timer signal, and the signal itself would interrupt the fork system call (returning EINTR). For large processes (where fork() takes a while) this could permanently livelock: the timer always would go off before the fork could complete, which got retried in a loop forever. The timer is no longer implemented as a unix signal, but uses threads. Thus the original problem no longer exists. The only remaining reason to block the timer tick is to prevent actions taken by the tick from interfering with the delicate process involved in fork (taking a load of locks and pausing everything). The only thing we need to do is to prevent the eventlog from being written to or flushed while the fork is taking place. To achieve this all we need to do is hold the mutex for the global eventlog buffer. This removes the last use of stopTimer that expects stopTimer to work synchronously (which it was not) and thus solves issue #27105. To be clear, we solve issue #27105 not by making stopTimer synchronous, but by eliminating the use sites that expected it to be synchronous. - - - - - 15 changed files: - + changelog.d/T27046 - + changelog.d/hadrian-stale-package-confs-26661 - compiler/GHC/CmmToAsm/AArch64/CodeGen.hs - hadrian/src/Hadrian/Haskell/Cabal/Parse.hs - rts/Capability.c - rts/Schedule.c - rts/Timer.c - rts/eventlog/EventLog.c - rts/eventlog/EventLog.h - rts/include/rts/OSThreads.h - + testsuite/tests/codeGen/should_run/T27046.hs - + testsuite/tests/codeGen/should_run/T27046_cmm.cmm - testsuite/tests/codeGen/should_run/all.T - testsuite/tests/rts/T27131.hs - testsuite/tests/rts/T27131.stdout Changes: ===================================== changelog.d/T27046 ===================================== @@ -0,0 +1,9 @@ +section: compiler +issues: #27046 +mrs: !16031 +synopsis: + Avoid AArch64 register clobbering bug in MUL2 +description: + Fixes an issue in which, on AArch64, code generation for the MUL2 operation + could clobber one of the input operands when computing the lower bits, which + rendered invalid the subsequent computation of the high bits. ===================================== changelog.d/hadrian-stale-package-confs-26661 ===================================== @@ -0,0 +1,6 @@ +section: packaging +synopsis: Hadrian no longer leaves stale `.conf` files in its package databases + when rebuilding in the same build root with different settings (e.g. another + flavour, or when hashes change with +hash-unit-ids). +issues: #26661 +mrs: !15186 ===================================== compiler/GHC/CmmToAsm/AArch64/CodeGen.hs ===================================== @@ -2300,11 +2300,19 @@ genCCall target dest_regs arg_regs = do let lo = getRegisterReg platform (CmmLocal dst_lo) hi = getRegisterReg platform (CmmLocal dst_hi) nd = getRegisterReg platform (CmmLocal dst_needed) + + -- Generate a fresh virtual register for the low word computation. + -- This avoids clobbering reg_a or reg_b in the first MUL instruction, + -- which could for example happen if 'lo' and 'reg_a' are the same + -- virtual register. + tmp_lo <- getNewRegNat II64 + return $ code_x `appOL` code_y `snocOL` - MUL II64 (OpReg W64 lo) (OpReg W64 reg_a) (OpReg W64 reg_b) `snocOL` + MUL II64 (OpReg W64 tmp_lo) (OpReg W64 reg_a) (OpReg W64 reg_b) `snocOL` SMULH (OpReg W64 hi) (OpReg W64 reg_a) (OpReg W64 reg_b) `snocOL` + MOV (OpReg W64 lo) (OpReg W64 tmp_lo) `snocOL` -- Are all high bits equal to the sign bit of the low word? -- nd = (hi == ASR(lo,width-1)) ? 1 : 0 CMP (OpReg W64 hi) (OpRegShift W64 lo SASR (widthInBits w - 1)) `snocOL` ===================================== hadrian/src/Hadrian/Haskell/Cabal/Parse.hs ===================================== @@ -15,6 +15,7 @@ module Hadrian.Haskell.Cabal.Parse ( ) where import Data.Bifunctor +import Data.Char (isDigit) import Data.List.Extra import Development.Shake import qualified Distribution.Compat.Graph as Graph @@ -55,6 +56,8 @@ import Builder import Context import Settings import Distribution.Simple.LocalBuildInfo +import Distribution.Types.LocalBuildInfo (allTargetsInBuildOrder') +import Distribution.Types.TargetInfo (TargetInfo (..)) import qualified Distribution.Simple.Register as C import System.Directory (getCurrentDirectory) import qualified Distribution.InstalledPackageInfo as CP @@ -394,35 +397,48 @@ registerPackage rs context = do -- Note: the @cPath@ is ignored. The path that's used is the 'buildDir' path -- from the local build info @lbi@. lbi <- liftIO $ C.getPersistBuildConfig Nothing (C.makeSymbolicPath cPath) - liftIO $ register db_path pid pd lbi + -- This runs `ghc --abi-hash`, so do it outside the critical section below. + installedPkgInfo <- liftIO $ generateRegistrationInfo pd lbi + + let pkg_name = pkgName (package context) + -- Is this a pkg.conf for a previous build? + -- we want to match "ghc-9.15.1-abcd.conf" but not "ghc-boot-9.15.1.conf" + isPkgConf f = case stripPrefix (pkg_name ++ "-") (takeBaseName f) of + Just (c:_) -> isDigit c + _ -> takeBaseName f == pkg_name + + -- Unlike `ghc-pkg update/register` (used to populate the inplace and stage0 + -- databases), writing the .conf file directly doesn't remove units this + -- package was previously registered under. Stale .conf files from earlier + -- builds make this package's modules ambiguous (#26661), so delete them + -- before writing the new .conf file. + withResources rs $ do + confs <- liftIO $ getDirectoryFilesIO db_path ["*.conf"] + mapM_ (removeFile . (db_path >)) + [ f | f <- confs, isPkgConf f, takeBaseName f /= pid ] + liftIO $ writeUTF8File (db_path > pid <.> "conf") + (CP.showInstalledPackageInfo installedPkgInfo) -- Then after the register, which just writes the .conf file, do the recache step. buildWithResources rs $ target context (GhcPkg Recache (stage context)) [] [] -- This is copied and simplified from Cabal, because we want to install the package -- into a different package database to the one it was configured against. -register :: FilePath - -> String -- ^ Package Identifier - -> C.PackageDescription - -> LocalBuildInfo - -> IO () -register pkg_db pid pd lbi - = withLibLBI pd lbi $ \lib clbi -> do - - when reloc $ error "register does not support reloc" - installedPkgInfo <- generateRegistrationInfo pd lbi lib clbi - writeRegistrationFile installedPkgInfo - - where - regFile = pkg_db > pid <.> "conf" - reloc = relocatable lbi - - generateRegistrationInfo pkg lbi lib clbi = do - abi_hash <- C.mkAbiHash <$> GHC.libAbiHash C.silent pkg lbi lib clbi - return (C.absoluteInstalledPackageInfo pkg abi_hash lib lbi clbi) - - writeRegistrationFile installedPkgInfo = do - writeUTF8File regFile (CP.showInstalledPackageInfo installedPkgInfo) +-- See generateRegistrationInfo in Distribution.Simple.Register. we can't use it +-- directly because it computes the abi-hash using Cabal's internal package +-- database, which hadrian never creates. +generateRegistrationInfo :: C.PackageDescription + -> LocalBuildInfo + -> IO Installed.InstalledPackageInfo +generateRegistrationInfo pd lbi = do + when (relocatable lbi) $ error "register does not support reloc" + case [ (lib, targetCLBI tgt) | tgt <- allTargetsInBuildOrder' pd lbi + , CLib lib <- [targetComponent tgt] ] of + [(lib, clbi)] -> do + abi_hash <- C.mkAbiHash <$> GHC.libAbiHash C.silent pd lbi lib clbi + return (C.absoluteInstalledPackageInfo pd abi_hash lib lbi clbi) + libs -> error $ "generateRegistrationInfo: expected exactly one library for " + ++ C.display (C.package pd) ++ ", got " ++ show (length libs) -- | Build autogenerated files @autogen/cabal_macros.h@ and @autogen/Paths_*.hs@. ===================================== rts/Capability.c ===================================== @@ -443,13 +443,6 @@ void moreCapabilities (uint32_t from USED_IF_THREADS, uint32_t to USED_IF_THREADS) { #if defined(THREADED_RTS) - // We must disable the timer while we do this since the tick handler may - // call contextSwitchAllCapabilities, which may see the capabilities array - // as we free it. The alternative would be to protect the capabilities - // array with a lock but this seems more expensive than necessary. - // See #17289. - stopTimer(); - if (to == 1) { // THREADED_RTS must work on builds that don't have a mutable // BaseReg (eg. unregisterised), so in this case @@ -470,8 +463,6 @@ moreCapabilities (uint32_t from USED_IF_THREADS, uint32_t to USED_IF_THREADS) } debugTrace(DEBUG_sched, "allocated %d more capabilities", to - from); - - startTimer(); #endif } ===================================== rts/Schedule.c ===================================== @@ -37,6 +37,7 @@ #include "win32/AsyncWinIO.h" #endif #include "Trace.h" +#include "eventlog/EventLog.h" #include "RaiseAsync.h" #include "Threads.h" #include "Timer.h" @@ -2100,24 +2101,31 @@ forkProcess(HsStablePtr *entry ACQUIRE_LOCK(&all_tasks_mutex); #endif - stopTimer(); // See #4074 - #if defined(TRACING) - flushAllCapsEventsBufs(); // so that child won't inherit dirty file buffers +#if defined(HAVE_PREEMPTION) + // We must hold the eventlog global mutex over the fork to prevent the + // timer thread from trying to post events. While holding the mutex we need + // to flush the eventlogs (global and per-cap) so that child won't inherit + // dirty eventlog buffers or file buffers. + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); +#endif + flushAllCapsEventsBufs_(); #endif pid = fork(); if (pid) { // parent - startTimer(); // #4074 - RELEASE_LOCK(&sched_mutex); RELEASE_LOCK(&sm_mutex); RELEASE_LOCK(&stable_ptr_mutex); RELEASE_LOCK(&stable_name_mutex); RELEASE_LOCK(&task->lock); +#if defined(TRACING) && defined(HAVE_PREEMPTION) + RELEASE_LOCK_ALWAYS(&eventBufMutex); +#endif + #if defined(THREADED_RTS) /* N.B. releaseCapability_ below may need to take all_tasks_mutex */ RELEASE_LOCK(&all_tasks_mutex); @@ -2303,12 +2311,6 @@ setNumCapabilities (uint32_t new_n_capabilities USED_IF_THREADS) cap = rts_lock(); task = cap->running_task; - - // N.B. We must stop the interval timer while we are changing the - // capabilities array lest handle_tick may try to context switch - // an old capability. See #17289. - stopTimer(); - stopAllCapabilities(&cap, task); if (new_n_capabilities < enabled_capabilities) @@ -2364,9 +2366,7 @@ setNumCapabilities (uint32_t new_n_capabilities USED_IF_THREADS) tracingAddCapabilities(n_capabilities, new_n_capabilities); #endif - // Resize the capabilities array - // NB. after this, capabilities points somewhere new. Any pointers - // of type (Capability *) are now invalid. + // Allocate and initialise the extra capabilities moreCapabilities(n_capabilities, new_n_capabilities); // Resize and update storage manager data structures @@ -2394,8 +2394,6 @@ setNumCapabilities (uint32_t new_n_capabilities USED_IF_THREADS) // Notify IO manager that the number of capabilities has changed. notifyIOManagerCapabilitiesChanged(&cap); - startTimer(); - rts_unlock(cap); #endif // THREADED_RTS ===================================== rts/Timer.c ===================================== @@ -28,11 +28,6 @@ #include "RtsSignals.h" #include "rts/EventLogWriter.h" -// See Note [No timer on wasm32] -#if !defined(wasm32_HOST_ARCH) -#define HAVE_PREEMPTION -#endif - // This global counter is used to allow multiple threads to stop the // timer temporarily with a stopTimer()/startTimer() pair. If // timer_enabled == 0 timer is enabled ===================================== rts/eventlog/EventLog.c ===================================== @@ -129,8 +129,11 @@ typedef struct _EventsBuf { static EventsBuf *capEventBuf; // one EventsBuf for each Capability static EventsBuf eventBuf; // an EventsBuf not associated with any Capability -#if defined(THREADED_RTS) -static Mutex eventBufMutex; // protected by this mutex +#if defined(HAVE_PREEMPTION) +// Note that this mutex is used even in the non-threaded RTS, since the timer +// thread posts events and flushes. So _all_ uses of this mutex must use +// ACQUIRE_LOCK_ALWAYS/RELEASE_LOCK_ALWAYS. +Mutex eventBufMutex; // protects eventBuf above #endif // Event type @@ -393,8 +396,10 @@ initEventLogging(void) moreCapEventBufs(0, get_n_capabilities()); initEventsBuf(&eventBuf, EVENT_LOG_SIZE, (EventCapNo)(-1)); -#if defined(THREADED_RTS) +#if defined(HAVE_PREEMPTION) initMutex(&eventBufMutex); +#endif +#if defined(THREADED_RTS) initMutex(&state_change_mutex); #endif } @@ -416,7 +421,7 @@ startEventLogging_(void) { initEventLogWriter(); - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); postHeaderEvents(); /* @@ -425,7 +430,7 @@ startEventLogging_(void) */ printAndClearEventBuf(&eventBuf); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); return true; } @@ -495,7 +500,7 @@ endEventLogging(void) flushEventLog_(NULL); - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); // Mark end of events (data). postEventTypeNum(&eventBuf, EVENT_DATA_END); @@ -503,7 +508,7 @@ endEventLogging(void) // Flush the end of data marker. printAndClearEventBuf(&eventBuf); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); stopEventLogWriter(); event_log_writer = NULL; @@ -666,7 +671,7 @@ void postCapEvent (EventTypeNum tag, EventCapNo capno) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, tag); postEventHeader(&eventBuf, tag); @@ -685,14 +690,14 @@ postCapEvent (EventTypeNum tag, barf("postCapEvent: unknown event tag %d", tag); } - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postCapsetEvent (EventTypeNum tag, EventCapsetID capset, StgWord info) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, tag); postEventHeader(&eventBuf, tag); @@ -726,7 +731,7 @@ void postCapsetEvent (EventTypeNum tag, barf("postCapsetEvent: unknown event tag %d", tag); } - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postCapsetStrEvent (EventTypeNum tag, @@ -740,14 +745,14 @@ void postCapsetStrEvent (EventTypeNum tag, return; } - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); if (!hasRoomForVariableEvent(&eventBuf, size)){ printAndClearEventBuf(&eventBuf); if (!hasRoomForVariableEvent(&eventBuf, size)){ errorBelch("Event size exceeds buffer size, bail out"); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); return; } } @@ -758,7 +763,7 @@ void postCapsetStrEvent (EventTypeNum tag, postBuf(&eventBuf, (StgWord8*) msg, strsize); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postCapsetVecEvent (EventTypeNum tag, @@ -783,14 +788,14 @@ void postCapsetVecEvent (EventTypeNum tag, } } - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); if (!hasRoomForVariableEvent(&eventBuf, size)){ printAndClearEventBuf(&eventBuf); if(!hasRoomForVariableEvent(&eventBuf, size)){ errorBelch("Event size exceeds buffer size, bail out"); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); return; } } @@ -804,7 +809,7 @@ void postCapsetVecEvent (EventTypeNum tag, postBuf(&eventBuf, (StgWord8*) argv[i], 1 + strlen(argv[i])); } - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postWallClockTime (EventCapsetID capset) @@ -813,7 +818,7 @@ void postWallClockTime (EventCapsetID capset) StgWord64 sec; StgWord32 nsec; - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); /* The EVENT_WALL_CLOCK_TIME event is intended to allow programs reading the eventlog to match up the event timestamps with wall @@ -846,7 +851,7 @@ void postWallClockTime (EventCapsetID capset) postWord64(&eventBuf, sec); postWord32(&eventBuf, nsec); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } /* @@ -885,7 +890,7 @@ void postEventHeapInfo (EventCapsetID heap_capset, W_ mblockSize, W_ blockSize) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_HEAP_INFO_GHC); postEventHeader(&eventBuf, EVENT_HEAP_INFO_GHC); @@ -899,7 +904,7 @@ void postEventHeapInfo (EventCapsetID heap_capset, postWord64(&eventBuf, mblockSize); postWord64(&eventBuf, blockSize); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postEventGcStats (Capability *cap, @@ -952,7 +957,7 @@ void postTaskCreateEvent (EventTaskId taskId, EventCapNo capno, EventKernelThreadId tid) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_TASK_CREATE); postEventHeader(&eventBuf, EVENT_TASK_CREATE); @@ -961,14 +966,14 @@ void postTaskCreateEvent (EventTaskId taskId, postCapNo(&eventBuf, capno); postKernelThreadId(&eventBuf, tid); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postTaskMigrateEvent (EventTaskId taskId, EventCapNo capno, EventCapNo new_capno) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_TASK_MIGRATE); postEventHeader(&eventBuf, EVENT_TASK_MIGRATE); @@ -977,28 +982,28 @@ void postTaskMigrateEvent (EventTaskId taskId, postCapNo(&eventBuf, capno); postCapNo(&eventBuf, new_capno); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postTaskDeleteEvent (EventTaskId taskId) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_TASK_DELETE); postEventHeader(&eventBuf, EVENT_TASK_DELETE); /* EVENT_TASK_DELETE (taskID) */ postTaskId(&eventBuf, taskId); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postEventNoCap (EventTypeNum tag) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, tag); postEventHeader(&eventBuf, tag); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void @@ -1042,9 +1047,9 @@ void postLogMsg(EventsBuf *eb, EventTypeNum type, char *msg, va_list ap) void postMsg(char *msg, va_list ap) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); postLogMsg(&eventBuf, EVENT_LOG_MSG, msg, ap); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postCapMsg(Capability *cap, char *msg, va_list ap) @@ -1138,32 +1143,32 @@ void postConcUpdRemSetFlush(Capability *cap) void postConcMarkEnd(StgWord32 marked_obj_count) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_CONC_MARK_END); postEventHeader(&eventBuf, EVENT_CONC_MARK_END); postWord32(&eventBuf, marked_obj_count); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postNonmovingHeapCensus(uint16_t blk_size, const struct NonmovingAllocCensus *census) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); postEventHeader(&eventBuf, EVENT_NONMOVING_HEAP_CENSUS); postWord16(&eventBuf, blk_size); postWord32(&eventBuf, census->n_active_segs); postWord32(&eventBuf, census->n_filled_segs); postWord32(&eventBuf, census->n_live_blocks); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); postEventHeader(&eventBuf, EVENT_NONMOVING_PRUNED_SEGMENTS); postWord32(&eventBuf, pruned_segments); postWord32(&eventBuf, free_segments); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void closeBlockMarker (EventsBuf *ebuf) @@ -1224,7 +1229,7 @@ static HeapProfBreakdown getHeapProfBreakdown(void) void postHeapProfBegin(void) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); PROFILING_FLAGS *flags = &RtsFlags.ProfFlags; StgWord modSelector_len = flags->modSelector ? strlen(flags->modSelector) : 0; @@ -1258,42 +1263,42 @@ void postHeapProfBegin(void) postStringLen(&eventBuf, flags->ccsSelector, ccsSelector_len); postStringLen(&eventBuf, flags->retainerSelector, retainerSelector_len); postStringLen(&eventBuf, flags->bioSelector, bioSelector_len); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postHeapProfSampleBegin(StgInt era) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_HEAP_PROF_SAMPLE_BEGIN); postEventHeader(&eventBuf, EVENT_HEAP_PROF_SAMPLE_BEGIN); postWord64(&eventBuf, era); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postHeapBioProfSampleBegin(StgInt era, StgWord64 time) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_HEAP_BIO_PROF_SAMPLE_BEGIN); postEventHeader(&eventBuf, EVENT_HEAP_BIO_PROF_SAMPLE_BEGIN); postWord64(&eventBuf, era); postWord64(&eventBuf, time); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postHeapProfSampleEnd(StgInt era) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_HEAP_PROF_SAMPLE_END); postEventHeader(&eventBuf, EVENT_HEAP_PROF_SAMPLE_END); postWord64(&eventBuf, era); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postHeapProfSampleString(const char *label, StgWord64 residency) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); StgWord label_len = strlen(label); StgWord len = 1+8+label_len+1; CHECK(!ensureRoomForVariableEvent(&eventBuf, len)); @@ -1303,7 +1308,7 @@ void postHeapProfSampleString(const char *label, postWord8(&eventBuf, 0); postWord64(&eventBuf, residency); postStringLen(&eventBuf, label, label_len); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } #if defined(PROFILING) @@ -1313,7 +1318,7 @@ void postHeapProfCostCentre(StgWord32 ccID, const char *srcloc, StgBool is_caf) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); StgWord label_len = strlen(label); StgWord module_len = strlen(module); StgWord srcloc_len = strlen(srcloc); @@ -1326,13 +1331,13 @@ void postHeapProfCostCentre(StgWord32 ccID, postStringLen(&eventBuf, module, module_len); postStringLen(&eventBuf, srcloc, srcloc_len); postWord8(&eventBuf, is_caf); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void postHeapProfSampleCostCentre(CostCentreStack *stack, StgWord64 residency) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); StgWord depth = 0; CostCentreStack *ccs; for (ccs = stack; ccs != NULL && ccs != CCS_MAIN; ccs = ccs->prevStack) @@ -1351,7 +1356,7 @@ void postHeapProfSampleCostCentre(CostCentreStack *stack, depth>0 && ccs != NULL && ccs != CCS_MAIN; ccs = ccs->prevStack, depth--) postWord32(&eventBuf, ccs->cc->ccID); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } @@ -1359,7 +1364,7 @@ void postProfSampleCostCentre(Capability *cap, CostCentreStack *stack, StgWord64 tick) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); StgWord depth = 0; CostCentreStack *ccs; for (ccs = stack; ccs != NULL && ccs != CCS_MAIN; ccs = ccs->prevStack) @@ -1377,7 +1382,7 @@ void postProfSampleCostCentre(Capability *cap, depth>0 && ccs != NULL && ccs != CCS_MAIN; ccs = ccs->prevStack, depth--) postWord32(&eventBuf, ccs->cc->ccID); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } // This event is output at the start of profiling so the tick interval can @@ -1385,11 +1390,11 @@ void postProfSampleCostCentre(Capability *cap, // can be calculated from how many samples there are. void postProfBegin(void) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); postEventHeader(&eventBuf, EVENT_PROF_BEGIN); // The interval that each tick was sampled, in nanoseconds postWord64(&eventBuf, TimeToNS(RtsFlags.MiscFlags.tickInterval)); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } #endif /* PROFILING */ @@ -1415,11 +1420,11 @@ static void postTickyCounterDef(EventsBuf *eb, StgEntCounter *p) void postTickyCounterDefs(StgEntCounter *counters) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); for (StgEntCounter *p = counters; p != NULL; p = p->link) { postTickyCounterDef(&eventBuf, p); } - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } static void postTickyCounterSample(EventsBuf *eb, StgEntCounter *p) @@ -1443,13 +1448,13 @@ static void postTickyCounterSample(EventsBuf *eb, StgEntCounter *p) void postTickyCounterSamples(StgEntCounter *counters) { - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); ensureRoomForEvent(&eventBuf, EVENT_TICKY_COUNTER_SAMPLE); postEventHeader(&eventBuf, EVENT_TICKY_COUNTER_BEGIN_SAMPLE); for (StgEntCounter *p = counters; p != NULL; p = p->link) { postTickyCounterSample(&eventBuf, p); } - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } #endif /* TICKY_TICKY */ void postIPE(const InfoProvEnt *ipe) @@ -1459,7 +1464,7 @@ void postIPE(const InfoProvEnt *ipe) // See Note [Maximum event length]. const StgWord MAX_IPE_STRING_LEN = 65535; - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); StgWord table_name_len = MIN(strlen(ipe->prov.table_name), MAX_IPE_STRING_LEN); StgWord closure_desc_len = MIN(strlen(closure_desc_buf), MAX_IPE_STRING_LEN); StgWord ty_desc_len = MIN(strlen(ipe->prov.ty_desc), MAX_IPE_STRING_LEN); @@ -1489,7 +1494,7 @@ void postIPE(const InfoProvEnt *ipe) postBuf(&eventBuf, &colon, 1); postStringLen(&eventBuf, ipe->prov.src_span, src_span_len); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); } void printAndClearEventBuf (EventsBuf *ebuf) @@ -1601,14 +1606,21 @@ void flushLocalEventsBuf(Capability *cap) // Flush all capabilities' event buffers when we already hold all capabilities. // Used during forkProcess. void flushAllCapsEventsBufs(void) +{ + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); + flushAllCapsEventsBufs_(); + RELEASE_LOCK_ALWAYS(&eventBufMutex); +} + +// Unsafe version that does not acquire/release eventBufMutex. You must +// hold the eventBufMutex, which you must acquire with ACQUIRE_LOCK_ALWAYS! +void flushAllCapsEventsBufs_(void) { if (!event_log_writer) { return; } - ACQUIRE_LOCK(&eventBufMutex); printAndClearEventBuf(&eventBuf); - RELEASE_LOCK(&eventBufMutex); for (unsigned int i=0; i < getNumCapabilities(); i++) { flushLocalEventsBuf(getCapability(i)); @@ -1641,9 +1653,9 @@ static void flushEventLog_(Capability **cap USED_IF_THREADS) return; } - ACQUIRE_LOCK(&eventBufMutex); + ACQUIRE_LOCK_ALWAYS(&eventBufMutex); printAndClearEventBuf(&eventBuf); - RELEASE_LOCK(&eventBufMutex); + RELEASE_LOCK_ALWAYS(&eventBufMutex); #if defined(THREADED_RTS) Task *task = newBoundTask(); ===================================== rts/eventlog/EventLog.h ===================================== @@ -18,6 +18,13 @@ #if defined(TRACING) extern bool eventlog_enabled; +#if defined(HAVE_PREEMPTION) +// Avoid using this mutex directly if at all possible. It is needed in the +// implementation of forkProcess. +// +// All uses of this mutex must use ACQUIRE_LOCK_ALWAYS/RELEASE_LOCK_ALWAYS. +extern Mutex eventBufMutex; +#endif void initEventLogging(void); void restartEventLogging(void); @@ -27,6 +34,7 @@ void abortEventLogging(void); // #4512 - after fork child needs to abort void moreCapEventBufs (uint32_t from, uint32_t to); void flushLocalEventsBuf(Capability *cap); void flushAllCapsEventsBufs(void); +void flushAllCapsEventsBufs_(void); void flushAllEventsBufs(Capability *cap); typedef void (*EventlogInitPost)(void); ===================================== rts/include/rts/OSThreads.h ===================================== @@ -14,6 +14,46 @@ #pragma once +/* Note [Threads and preemption] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + All full-fat OSs that GHC works on have OS threads, and we use them even in + the non-threaded RTS for a few features: + * Haskell thread preemption; + * sample-based profiling; + * idle GC; + * periodic eventlog flushing. + + We use defined(HAVE_PREEMPTION) to decide if these features are implemented + via OS threads. + + On platforms like WASM/js we do not have OS threads in any conventional + sense, and the features above are either not available or are implemented + differently. See Note [No timer on wasm32]. + + In future if GHC is ported to platforms like bare-metal micro-controllers, + RTOSs or to run directly under hypervisors then such platforms may also not + have threads available and they should not define HAVE_PREEMPTION here. Or + for some micro-controller RTOSs like Zeypher one may have a choice about + whether to use threads or not (at a size cost). Here would be the right + place to control whether the feature list above is supported. + */ +#if defined(wasm32_HOST_ARCH) + // See Note [No timer on wasm32] + // To confuse matters, WASM _does_ have pthread.h but it doesnt work. +#elif defined(HAVE_PTHREAD_H) || defined(HAVE_WINDOWS_H) +#define HAVE_PREEMPTION +#else +#error Decide if this platform has threads and pre-emption or not. +#endif +// And JS does all of this differently, without using this bit of the RTS. + +// Configuration sanity check +#if defined(THREADED_RTS) && !defined(HAVE_PREEMPTION) +//TODO we would like to be able to assert this: +// #error Configuration error: THREADED_RTS should imply HAVE_PREEMPTION +// however at the moment we cannot due to issue #27346. +#endif + #if defined(HAVE_PTHREAD_H) && !defined(mingw32_HOST_OS) #if defined(CMINUSMINUS) @@ -210,9 +250,29 @@ extern bool timedWaitCondition ( Condition* pCond, Mutex* pMut, Time timeout) // // Mutexes // +// Even in the non-threaded RTS we use threads and mutexes! In particular the +// timer/ticker is implemented using a thread. And using threads needs locks. +// In particular we need locks for the data shared between the timer/ticker +// thread and the thread running the main capability. +#if defined(HAVE_PREEMPTION) extern void initMutex ( Mutex* pMut ); extern void closeMutex ( Mutex* pMut ); +// The "always" variants do locking in the threaded and non-threaded RTS. +// The normal variants below are no-ops in the non-threaded RTS. +#define ACQUIRE_LOCK_ALWAYS(l) OS_ACQUIRE_LOCK(l) +#define TRY_ACQUIRE_LOCK_ALWAYS(l) OS_TRY_ACQUIRE_LOCK(l) +#define RELEASE_LOCK_ALWAYS(l) OS_RELEASE_LOCK(l) +#define ASSERT_LOCK_HELD_ALWAYS(l) OS_ASSERT_LOCK_HELD(l) +#else +// And just to be a bit confusing, the always variants are still no-ops when we +// do not HAVE_PREEMPTION, since then we don't have threads or mutexes at all. +#define ACQUIRE_LOCK_ALWAYS(l) +#define TRY_ACQUIRE_LOCK_ALWAYS(l) 0 +#define RELEASE_LOCK_ALWAYS(l) +#define ASSERT_LOCK_HELD_ALWAYS(l) +#endif + // Processors and affinity void setThreadAffinity (uint32_t n, uint32_t m); void setThreadNode (uint32_t node); @@ -228,6 +288,7 @@ void releaseThreadNode (void); #else +// No-ops in the non-threaded RTS. See also the _ALWAYS variants above. #define ACQUIRE_LOCK(l) #define TRY_ACQUIRE_LOCK(l) 0 #define RELEASE_LOCK(l) ===================================== testsuite/tests/codeGen/should_run/T27046.hs ===================================== @@ -0,0 +1,29 @@ +{-# LANGUAGE MagicHash #-} +{-# LANGUAGE ForeignFunctionInterface, GHCForeignImportPrim, UnliftedFFITypes #-} + +module Main where + +import Control.Monad + ( unless ) +import Data.Bits + ( shiftL ) +import GHC.Exts + ( Int64# ) +import GHC.Int + ( Int64(..) ) + +foreign import prim "test_mul2_clobber" + test_mul2_clobber :: Int64# -> Int64# -> Int64# + +main :: IO () +main = do + let + I64# x = 1 `shiftL` 32 + hi = I64# $ test_mul2_clobber x x + + unless ( hi == 1 ) $ + error $ unlines + [ "Incorrect result for Mul2 operation." + , "Expected high word: 1" + , " Actual high word: " ++ show hi + ] ===================================== testsuite/tests/codeGen/should_run/T27046_cmm.cmm ===================================== @@ -0,0 +1,13 @@ +#include "Cmm.h" + +// Test for #27046 +test_mul2_clobber (bits64 x, bits64 y) +{ + bits64 hi, nd; + + // Deliberately alias the destination 'lo' with the source 'x' + // This forces the NCG to use the same virtual register for both. + (nd, hi, x) = prim %mul2_64(x, y); + + return (hi); +} ===================================== testsuite/tests/codeGen/should_run/all.T ===================================== @@ -260,6 +260,13 @@ test('T25364', normal, compile_and_run, ['']) test('T26061', normal, compile_and_run, ['']) test('T26537', normal, compile_and_run, ['-O2 -fregs-graph']) test('T24016', normal, compile_and_run, ['-O1 -fPIC']) +test('T27046', + [ req_cmm + , when(arch('i386'), skip) # i386 does not support MO_S_Mul2 W64 + , when(arch('wasm32'), skip) + , js_skip + , when(unregisterised(), skip) # pprCallishMachOp_for_C: MO_S_Mul2 W64 not supported + ], compile_and_run, ['T27046_cmm.cmm']) # Check that GHC-generated finalizers run on Darwin. The Apple linker doesn't # support --wrap, so we can't intercept hs_spt_remove directly. Instead we ===================================== testsuite/tests/rts/T27131.hs ===================================== @@ -30,16 +30,22 @@ foreign import ccall unsafe "has_local_stop_after_return" main :: IO () main = do setNumCapabilities 2 - checkFlag - "TSO_STOP_NEXT_BREAKPOINT" - rts_enableStopNextBreakpoint - rts_disableStopNextBreakpoint - c_hasLocalStopNextBreakpoint - checkFlag - "TSO_STOP_AFTER_RETURN" - rts_enableStopAfterReturn - rts_disableStopAfterReturn - c_hasLocalStopAfterReturn + -- Bind to capability 0 so it can't float between capabilities while the + -- target thread runs on capability 1. + doneVar <- newEmptyMVar + _ <- forkOn 0 $ do + checkFlag + "TSO_STOP_NEXT_BREAKPOINT" + rts_enableStopNextBreakpoint + rts_disableStopNextBreakpoint + c_hasLocalStopNextBreakpoint + checkFlag + "TSO_STOP_AFTER_RETURN" + rts_enableStopAfterReturn + rts_disableStopAfterReturn + c_hasLocalStopAfterReturn + putMVar doneVar () + takeMVar doneVar checkFlag :: String @@ -58,6 +64,7 @@ checkFlag label enable disable isMyThreadFlagSet = do ThreadId tid# <- forkOn 1 $ do replicateM_ 2 $ do replyVar <- takeMVar targetCheckVar + yield -- make sure we reprocess the mailbox isSet <- (/= 0) <$> isMyThreadFlagSet putMVar replyVar isSet ===================================== testsuite/tests/rts/T27131.stdout ===================================== @@ -1,6 +1,6 @@ -(0,False) +(0,True) TSO_STOP_NEXT_BREAKPOINT set: ok TSO_STOP_NEXT_BREAKPOINT unset: ok -(0,False) +(0,True) TSO_STOP_AFTER_RETURN set: ok TSO_STOP_AFTER_RETURN unset: ok View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/2de7ec638ed584fffbb38612a5d2e66... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/2de7ec638ed584fffbb38612a5d2e66... You're receiving this email because of your account on gitlab.haskell.org.
participants (1)
-
Duncan Coutts (@dcoutts)