Re: openFile gives "file is locked" error on Linux when creating a non-existing file

8 Oct 2024

      Some more information:

The root file system which contains the /tmp directory, where the file
is being created, is of type ext4:

df -T output:

Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/root ext4 76026616 53915404 22094828 71% /

cat /proc/mounts output:

/dev/root / ext4 rw,relatime,discard,errors=remount-ro 0 0

Also, I added the directory and file existence checks before the
creation of the file at all places, and now the problem stopped
happening. Maybe it became less likely and might surface some time
later.

-harendra

On Tue, 8 Oct 2024 at 18:08, Harendra Kumar  wrote:
...
What if we closed a file and created another one and the inode of the
previous file got reused for the new one? Is it possible that there is
a small window after the deletion of the old one in which GHC keeps
the lock in its hash table? If that happens then the newly created
file will see that there is already a lock on the file. Could it be
that the lock gets released when the handle is cleaned by GC or
something like that?
I can try adding a delay and/or performMajorGC before creating the new file.
-harendra
On Tue, 8 Oct 2024 at 15:57, Viktor Dukhovni  wrote:
...
On Tue, Oct 08, 2024 at 01:15:40PM +0530, Harendra Kumar wrote:
...
On Tue, 8 Oct 2024 at 11:50, Viktor Dukhovni  wrote:
...
What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in?
This happens on github Linux CI. Not sure which filesystem they are
using. Earlier I was wondering if something funny is happening in case
they are using NFS. But NFS usually causes issues due to caching of
directory entries if we are doing cross-node operations, here we are
on a single node and operations are not running in parallel (or that's
what I believe).  I will remove the hspec layer from the tests to make
sure that the code is simpler and our understanding is correct.
I will also run the tests on circle-ci to check if the problem occurs
there. I have never seen this problem in testing this on a Linux
machine on AWS even if I ran the tests for days in a loop.
Looking more closely at the GHC code, we see that there's an internal
(RTS not OS level) exclusive lock on the (device, inode) pair as part of
opening a Unix file for writes, or shared lock for reads.
rts/FileLock.c:
    int
    lockFile(StgWord64 id, StgWord64 dev, StgWord64 ino, int for_writing)
    {
        Lock key, *lock;
ACQUIRE_LOCK(&file_lock_mutex);
key.device = dev;
        key.inode  = ino;
lock = lookupHashTable_(obj_hash, (StgWord)&key, hashLock, cmpLocks);
if (lock == NULL)
        {
            lock = stgMallocBytes(sizeof(Lock), "lockFile");
            lock->device = dev;
            lock->inode  = ino;
            lock->readers = for_writing ? -1 : 1;
            insertHashTable_(obj_hash, (StgWord)lock, (void *)lock, hashLock);
            insertHashTable(key_hash, id, lock);
            RELEASE_LOCK(&file_lock_mutex);
            return 0;
        }
        else
        {
            // single-writer/multi-reader locking:
            if (for_writing || lock->readers < 0) {
                RELEASE_LOCK(&file_lock_mutex);
                return -1;
            }
            insertHashTable(key_hash, id, lock);
            lock->readers++;
            RELEASE_LOCK(&file_lock_mutex);
            return 0;
        }
    }
This is obtained in "libraries/base/GHC/IO/FD.hs", via:
mkFD fd iomode mb_stat is_socket is_nonblock = do
        ...
        case fd_type of
            Directory ->
               ioException (IOError Nothing InappropriateType "openFile"
                               "is a directory" Nothing Nothing)
-- regular files need to be locked
            RegularFile -> do
               -- On Windows we need an additional call to get a unique device id
               -- and inode, since fstat just returns 0 for both.
               -- See also Note [RTS File locking]
               (unique_dev, unique_ino) <- getUniqueFileInfo fd dev ino
               r <- lockFile (fromIntegral fd) unique_dev unique_ino
                             (fromBool write)
               when (r == -1)  $
                    ioException (IOError Nothing ResourceBusy "openFile"
                                       "file is locked" Nothing Nothing)
        ...
This suggests that when the file in question is opened there's already a
read lock in for the same dev/ino.  Perhaps the Github filesystem fails
to ensure uniqueness of dev+ino of open files (perhaps when open files
are already unlinked)?
--
    Viktor.
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs