
Some more information:
The root file system which contains the /tmp directory, where the file
is being created, is of type ext4:
df -T output:
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/root ext4 76026616 53915404 22094828 71% /
cat /proc/mounts output:
/dev/root / ext4 rw,relatime,discard,errors=remount-ro 0 0
Also, I added the directory and file existence checks before the
creation of the file at all places, and now the problem stopped
happening. Maybe it became less likely and might surface some time
later.
-harendra
On Tue, 8 Oct 2024 at 18:08, Harendra Kumar
What if we closed a file and created another one and the inode of the previous file got reused for the new one? Is it possible that there is a small window after the deletion of the old one in which GHC keeps the lock in its hash table? If that happens then the newly created file will see that there is already a lock on the file. Could it be that the lock gets released when the handle is cleaned by GC or something like that?
I can try adding a delay and/or performMajorGC before creating the new file.
-harendra
On Tue, 8 Oct 2024 at 15:57, Viktor Dukhovni
wrote: On Tue, Oct 08, 2024 at 01:15:40PM +0530, Harendra Kumar wrote:
On Tue, 8 Oct 2024 at 11:50, Viktor Dukhovni
wrote: What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in?
This happens on github Linux CI. Not sure which filesystem they are using. Earlier I was wondering if something funny is happening in case they are using NFS. But NFS usually causes issues due to caching of directory entries if we are doing cross-node operations, here we are on a single node and operations are not running in parallel (or that's what I believe). I will remove the hspec layer from the tests to make sure that the code is simpler and our understanding is correct.
I will also run the tests on circle-ci to check if the problem occurs there. I have never seen this problem in testing this on a Linux machine on AWS even if I ran the tests for days in a loop.
Looking more closely at the GHC code, we see that there's an internal (RTS not OS level) exclusive lock on the (device, inode) pair as part of opening a Unix file for writes, or shared lock for reads.
rts/FileLock.c: int lockFile(StgWord64 id, StgWord64 dev, StgWord64 ino, int for_writing) { Lock key, *lock;
ACQUIRE_LOCK(&file_lock_mutex);
key.device = dev; key.inode = ino;
lock = lookupHashTable_(obj_hash, (StgWord)&key, hashLock, cmpLocks);
if (lock == NULL) { lock = stgMallocBytes(sizeof(Lock), "lockFile"); lock->device = dev; lock->inode = ino; lock->readers = for_writing ? -1 : 1; insertHashTable_(obj_hash, (StgWord)lock, (void *)lock, hashLock); insertHashTable(key_hash, id, lock); RELEASE_LOCK(&file_lock_mutex); return 0; } else { // single-writer/multi-reader locking: if (for_writing || lock->readers < 0) { RELEASE_LOCK(&file_lock_mutex); return -1; } insertHashTable(key_hash, id, lock); lock->readers++; RELEASE_LOCK(&file_lock_mutex); return 0; } }
This is obtained in "libraries/base/GHC/IO/FD.hs", via:
mkFD fd iomode mb_stat is_socket is_nonblock = do ... case fd_type of Directory -> ioException (IOError Nothing InappropriateType "openFile" "is a directory" Nothing Nothing)
-- regular files need to be locked RegularFile -> do -- On Windows we need an additional call to get a unique device id -- and inode, since fstat just returns 0 for both. -- See also Note [RTS File locking] (unique_dev, unique_ino) <- getUniqueFileInfo fd dev ino r <- lockFile (fromIntegral fd) unique_dev unique_ino (fromBool write) when (r == -1) $ ioException (IOError Nothing ResourceBusy "openFile" "file is locked" Nothing Nothing) ...
This suggests that when the file in question is opened there's already a read lock in for the same dev/ino. Perhaps the Github filesystem fails to ensure uniqueness of dev+ino of open files (perhaps when open files are already unlinked)?
-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs