openFile gives "file is locked" error on Linux when creating a non-existing file

older
[ANNOUNCE] GHC 9.12.1-alpha3 is...

Harendra Kumar

7 Oct 2024 7 Oct '24

2:55 a.m.

Hi everyone, We have a long standing, elusive and intermittent test failure in streamly's file system event test suite. The code where the error emanates from looks like this: createFileWithParent :: FilePath -> FilePath -> IO () createFileWithParent file parent = do let filepath = parent file let dir = takeDirectory filepath putStrLn $ "createFileWithParent: file [" ++ file ++ "] dir [" ++ dir ++ "]" putStrLn $ "Ensuring dir: " ++ dir createDirectoryIfMissing True dir r <- doesDirectoryExist dir if r then do putStrLn $ "Ensured dir: " ++ dir when (not (null file)) $ do exists <- doesFileExist filepath if not exists then do putStrLn $ "Creating file: " ++ (parent file) openFile (parent file) WriteMode >>= hClose putStrLn $ "Created file: " ++ (parent file) else error $ "File exists: " ++ filepath else error $ "Could not create dir: " ++ dir The important thing in the code above is that we check that the file does not exist already, and we are creating it. Creating this new file intermittently fails with the following error messages, all of these are coming from prints in the above: createFileWithParent: file [file1] dir [/tmp/fsevent_dir-e1098325dc2b0880/watch-root] Ensuring dir: /tmp/fsevent_dir-e1098325dc2b0880/watch-root Ensured dir: /tmp/fsevent_dir-e1098325dc2b0880/watch-root Creating file: /tmp/fsevent_dir-e1098325dc2b0880/watch-root/file1 uncaught exception: IOException of type ResourceBusy /tmp/fsevent_dir-e1098325dc2b0880/watch-root/file1: openFile: resource busy (file is locked) How can a file that does not exist and being created be locked during creation? We see this only on Linux, the same tests always succeed on Windows and macOS. We are facing this in GHC 9.2.8, I am not sure if it is version specific because in the past I selectively disabled the tests for specific GHC versions and the error still surfaced in other GHC versions. Anything obvious that we may be missing here? Any pointers or more debug info to generate to debug this? I have not started looking into the openFile code yet, hoping someone else knowing it well might help here. Is this some other "resource busy" error which is being wrongly bucketed into "file is locked" error? Thanks, Harendra

Show replies by date

Kim-Ee Yeoh

7 Oct 7 Oct

3:11 a.m.

What do you see when you run an strace? On Mon, Oct 7, 2024 at 10:55 AM Harendra Kumar wrote:

...

Hi everyone,

We have a long standing, elusive and intermittent test failure in streamly's file system event test suite. The code where the error emanates from looks like this:

createFileWithParent :: FilePath -> FilePath -> IO () createFileWithParent file parent = do let filepath = parent file let dir = takeDirectory filepath putStrLn $ "createFileWithParent: file [" ++ file ++ "] dir [" ++ dir ++ "]" putStrLn $ "Ensuring dir: " ++ dir createDirectoryIfMissing True dir r <- doesDirectoryExist dir if r then do putStrLn $ "Ensured dir: " ++ dir when (not (null file)) $ do exists <- doesFileExist filepath if not exists then do putStrLn $ "Creating file: " ++ (parent file) openFile (parent file) WriteMode >>= hClose putStrLn $ "Created file: " ++ (parent file) else error $ "File exists: " ++ filepath else error $ "Could not create dir: " ++ dir

The important thing in the code above is that we check that the file does not exist already, and we are creating it. Creating this new file intermittently fails with the following error messages, all of these are coming from prints in the above:

createFileWithParent: file [file1] dir [/tmp/fsevent_dir-e1098325dc2b0880/watch-root] Ensuring dir: /tmp/fsevent_dir-e1098325dc2b0880/watch-root Ensured dir: /tmp/fsevent_dir-e1098325dc2b0880/watch-root Creating file: /tmp/fsevent_dir-e1098325dc2b0880/watch-root/file1

uncaught exception: IOException of type ResourceBusy /tmp/fsevent_dir-e1098325dc2b0880/watch-root/file1: openFile: resource busy (file is locked)

How can a file that does not exist and being created be locked during creation? We see this only on Linux, the same tests always succeed on Windows and macOS. We are facing this in GHC 9.2.8, I am not sure if it is version specific because in the past I selectively disabled the tests for specific GHC versions and the error still surfaced in other GHC versions.

Anything obvious that we may be missing here? Any pointers or more debug info to generate to debug this? I have not started looking into the openFile code yet, hoping someone else knowing it well might help here.

Is this some other "resource busy" error which is being wrongly bucketed into "file is locked" error?

Thanks, Harendra _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Viktor Dukhovni

10:52 a.m.

On Mon, Oct 07, 2024 at 08:25:21AM +0530, Harendra Kumar wrote:

...

exists <- doesFileExist filepath if not exists then do putStrLn $ "Creating file: " ++ (parent file) openFile (parent file) WriteMode >>= hClose putStrLn $ "Created file: " ++ (parent file) else error $ "File exists: " ++ filepath

This is a classic TOCTOU bug. The file can come into existence between the "doesFileExist" test and the attempt to create it. To create a file only if it does not exist, you need to use an "open" variant that creates the file if necessary, and leaves it unmodified if it already exists. "AppendMode" works for this, because you're closing the file immediately, so the fact that any writes would "append" is not material. So replace the above with: openFile filepath AppendMode >>= hClose If you need to know whether the file got created by this call, or was found to exist already, you need a lower-level API, such as (Unix C): /* In some cases 0600 is more appropriate */ int fd = open(path, O_WRONLY|O_CREAT|O_EXCL, 0666); if (fd >= 0) { /* Just created */ (void) close(fd); ... } else if (errno == EEXIST) { /* Already present */ ... } else { /* Permission or other problem */ ... } -- Viktor.

Viktor Dukhovni

11:04 a.m.

On Mon, Oct 07, 2024 at 09:52:06PM +1100, Viktor Dukhovni wrote:

...

If you need to know whether the file got created by this call, or was found to exist already, you need a lower-level API, such as (Unix C):

/* In some cases 0600 is more appropriate */ int fd = open(path, O_WRONLY|O_CREAT|O_EXCL, 0666);

if (fd >= 0) { /* Just created */ (void) close(fd); ... } else if (errno == EEXIST) { /* Already present */ ... } else { /* Permission or other problem */ ... }

I should mention that The above assumes a "local" filesystem, with NFS a race may still be possible, and the open(2) manpage may describe work-arounds, e.g. Linux: On NFS, O_EXCL is supported only when using NFSv3 or later on kernel 2.6 or later. In NFS environments where O_EXCL support is not provided, programs that rely on it for performing locking tasks will contain a race condition. Portable programs that want to perform atomic file locking using a lockfile, and need to avoid reliance on NFS support for O_EXCL, can create a unique file on the same filesystem (e.g., incorporating hostname and PID), and use link(2) to make a link to the lockfile. If link(2) returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful. -- Viktor.

Harendra Kumar

8 Oct 8 Oct

5:53 a.m.

Hi Viktor, Thanks for looking into this. This cannot be a TOCTOU bug as the code to check the existence of the file is only introduced for debugging this issue, to report in case the file exists for some reason. Our understanding is that this file cannot exist in the first place. We have never seen the "File exists" message being printed, I will make that an error to make sure. The tests create a temporary file using a random directory name in the system temp directory, the directory is destroyed at the end of the test. Also, tests do not run in parallel, we are using hspec to run tests and it does not run tests in parallel unless we explicitly say so, so there is no possibility that two tests could be trying to use the same file. We will double check that. Also, this happens only on Linux. We will also try the append mode as you suggested. -harendra On Mon, 7 Oct 2024 at 16:22, Viktor Dukhovni wrote:

...

On Mon, Oct 07, 2024 at 08:25:21AM +0530, Harendra Kumar wrote:

...
exists <- doesFileExist filepath if not exists then do putStrLn $ "Creating file: " ++ (parent file) openFile (parent file) WriteMode >>= hClose putStrLn $ "Created file: " ++ (parent file) else error $ "File exists: " ++ filepath

This is a classic TOCTOU bug. The file can come into existence between the "doesFileExist" test and the attempt to create it. To create a file only if it does not exist, you need to use an "open" variant that creates the file if necessary, and leaves it unmodified if it already exists. "AppendMode" works for this, because you're closing the file immediately, so the fact that any writes would "append" is not material.

So replace the above with:

openFile filepath AppendMode >>= hClose

If you need to know whether the file got created by this call, or was found to exist already, you need a lower-level API, such as (Unix C):

/* In some cases 0600 is more appropriate */ int fd = open(path, O_WRONLY|O_CREAT|O_EXCL, 0666);

if (fd >= 0) { /* Just created */ (void) close(fd); ... } else if (errno == EEXIST) { /* Already present */ ... } else { /* Permission or other problem */ ... }

-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Viktor Dukhovni

6:20 a.m.

On Tue, Oct 08, 2024 at 11:23:14AM +0530, Harendra Kumar wrote:

...

This cannot be a TOCTOU bug as the code to check the existence of the file is only introduced for debugging this issue, to report in case the file exists for some reason. Our understanding is that this file cannot exist in the first place. We have never seen the "File exists" message being printed, I will make that an error to make sure. The tests create a temporary file using a random directory name in the system temp directory, the directory is destroyed at the end of the test. Also, tests do not run in parallel, we are using hspec to run tests and it does not run tests in parallel unless we explicitly say so, so there is no possibility that two tests could be trying to use the same file. We will double check that. Also, this happens only on Linux. We will also try the append mode as you suggested.

What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in? Creating and closing a file in write mode from GHC: import System.IO main :: IO () main = do putStrLn "Show time" >> hFlush stdout openFile "/tmp/foo.out" WriteMode >>= hClose translates on Linux to (strace): write(1, "Show time\n", 10) = 10 openat(AT_FDCWD, "/tmp/foo.out", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 6 newfstatat(6, "", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_EMPTY_PATH) = 0 ftruncate(6, 0) = 0 ioctl(6, TCGETS, 0x7ffd358412a0) = -1 ENOTTY (Inappropriate ioctl for device) close(6) = 0 Nothing at all unusual happening here, so if the OS returns EBUSY, perhaps there's something interesting you can report about the state of that directory before file creation? Perhaps there's some filesystem or other kernel resource you're maxing out during the tests? -- Viktor.

Harendra Kumar

7:45 a.m.

On Tue, 8 Oct 2024 at 11:50, Viktor Dukhovni wrote:

...

What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in?

This happens on github Linux CI. Not sure which filesystem they are using. Earlier I was wondering if something funny is happening in case they are using NFS. But NFS usually causes issues due to caching of directory entries if we are doing cross-node operations, here we are on a single node and operations are not running in parallel (or that's what I believe). I will remove the hspec layer from the tests to make sure that the code is simpler and our understanding is correct. I will also run the tests on circle-ci to check if the problem occurs there. I have never seen this problem in testing this on a Linux machine on AWS even if I ran the tests for days in a loop.

...

Creating and closing a file in write mode from GHC:

import System.IO

main :: IO () main = do putStrLn "Show time" >> hFlush stdout openFile "/tmp/foo.out" WriteMode >>= hClose

translates on Linux to (strace):

write(1, "Show time\n", 10) = 10 openat(AT_FDCWD, "/tmp/foo.out", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 6 newfstatat(6, "", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_EMPTY_PATH) = 0 ftruncate(6, 0) = 0 ioctl(6, TCGETS, 0x7ffd358412a0) = -1 ENOTTY (Inappropriate ioctl for device) close(6) = 0

Nothing at all unusual happening here, so if the OS returns EBUSY, perhaps there's something interesting you can report about the state of that directory before file creation? Perhaps there's some filesystem or other kernel resource you're maxing out during the tests?

Is EBUSY errno getting translated to "file is locked" error here? In that case there can be other possibilities, depending on the machine or the file system. The error message should perhaps indicate other possibilities here, something like "file may be locked, or the file system is temporarily busy/unavailable". Let me check the openFile code, how it is translating system errors to user errors. -harendra

Viktor Dukhovni

10:27 a.m.

On Tue, Oct 08, 2024 at 01:15:40PM +0530, Harendra Kumar wrote:

...

On Tue, 8 Oct 2024 at 11:50, Viktor Dukhovni wrote:

...
What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in?

This happens on github Linux CI. Not sure which filesystem they are using. Earlier I was wondering if something funny is happening in case they are using NFS. But NFS usually causes issues due to caching of directory entries if we are doing cross-node operations, here we are on a single node and operations are not running in parallel (or that's what I believe). I will remove the hspec layer from the tests to make sure that the code is simpler and our understanding is correct.

I will also run the tests on circle-ci to check if the problem occurs there. I have never seen this problem in testing this on a Linux machine on AWS even if I ran the tests for days in a loop.

Looking more closely at the GHC code, we see that there's an internal (RTS not OS level) exclusive lock on the (device, inode) pair as part of opening a Unix file for writes, or shared lock for reads. rts/FileLock.c: int lockFile(StgWord64 id, StgWord64 dev, StgWord64 ino, int for_writing) { Lock key, *lock; ACQUIRE_LOCK(&file_lock_mutex); key.device = dev; key.inode = ino; lock = lookupHashTable_(obj_hash, (StgWord)&key, hashLock, cmpLocks); if (lock == NULL) { lock = stgMallocBytes(sizeof(Lock), "lockFile"); lock->device = dev; lock->inode = ino; lock->readers = for_writing ? -1 : 1; insertHashTable_(obj_hash, (StgWord)lock, (void *)lock, hashLock); insertHashTable(key_hash, id, lock); RELEASE_LOCK(&file_lock_mutex); return 0; } else { // single-writer/multi-reader locking: if (for_writing || lock->readers < 0) { RELEASE_LOCK(&file_lock_mutex); return -1; } insertHashTable(key_hash, id, lock); lock->readers++; RELEASE_LOCK(&file_lock_mutex); return 0; } } This is obtained in "libraries/base/GHC/IO/FD.hs", via: mkFD fd iomode mb_stat is_socket is_nonblock = do ... case fd_type of Directory -> ioException (IOError Nothing InappropriateType "openFile" "is a directory" Nothing Nothing) -- regular files need to be locked RegularFile -> do -- On Windows we need an additional call to get a unique device id -- and inode, since fstat just returns 0 for both. -- See also Note [RTS File locking] (unique_dev, unique_ino) <- getUniqueFileInfo fd dev ino r <- lockFile (fromIntegral fd) unique_dev unique_ino (fromBool write) when (r == -1) $ ioException (IOError Nothing ResourceBusy "openFile" "file is locked" Nothing Nothing) ... This suggests that when the file in question is opened there's already a read lock in for the same dev/ino. Perhaps the Github filesystem fails to ensure uniqueness of dev+ino of open files (perhaps when open files are already unlinked)? -- Viktor.

Harendra Kumar

12:38 p.m.

What if we closed a file and created another one and the inode of the previous file got reused for the new one? Is it possible that there is a small window after the deletion of the old one in which GHC keeps the lock in its hash table? If that happens then the newly created file will see that there is already a lock on the file. Could it be that the lock gets released when the handle is cleaned by GC or something like that? I can try adding a delay and/or performMajorGC before creating the new file. -harendra On Tue, 8 Oct 2024 at 15:57, Viktor Dukhovni wrote:

...

On Tue, Oct 08, 2024 at 01:15:40PM +0530, Harendra Kumar wrote:

...
On Tue, 8 Oct 2024 at 11:50, Viktor Dukhovni wrote:

...
What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in?

This happens on github Linux CI. Not sure which filesystem they are using. Earlier I was wondering if something funny is happening in case they are using NFS. But NFS usually causes issues due to caching of directory entries if we are doing cross-node operations, here we are on a single node and operations are not running in parallel (or that's what I believe). I will remove the hspec layer from the tests to make sure that the code is simpler and our understanding is correct.

I will also run the tests on circle-ci to check if the problem occurs there. I have never seen this problem in testing this on a Linux machine on AWS even if I ran the tests for days in a loop.

Looking more closely at the GHC code, we see that there's an internal (RTS not OS level) exclusive lock on the (device, inode) pair as part of opening a Unix file for writes, or shared lock for reads.

rts/FileLock.c: int lockFile(StgWord64 id, StgWord64 dev, StgWord64 ino, int for_writing) { Lock key, *lock;

ACQUIRE_LOCK(&file_lock_mutex);

key.device = dev; key.inode = ino;

lock = lookupHashTable_(obj_hash, (StgWord)&key, hashLock, cmpLocks);

if (lock == NULL) { lock = stgMallocBytes(sizeof(Lock), "lockFile"); lock->device = dev; lock->inode = ino; lock->readers = for_writing ? -1 : 1; insertHashTable_(obj_hash, (StgWord)lock, (void *)lock, hashLock); insertHashTable(key_hash, id, lock); RELEASE_LOCK(&file_lock_mutex); return 0; } else { // single-writer/multi-reader locking: if (for_writing || lock->readers < 0) { RELEASE_LOCK(&file_lock_mutex); return -1; } insertHashTable(key_hash, id, lock); lock->readers++; RELEASE_LOCK(&file_lock_mutex); return 0; } }

This is obtained in "libraries/base/GHC/IO/FD.hs", via:

mkFD fd iomode mb_stat is_socket is_nonblock = do ... case fd_type of Directory -> ioException (IOError Nothing InappropriateType "openFile" "is a directory" Nothing Nothing)

-- regular files need to be locked RegularFile -> do -- On Windows we need an additional call to get a unique device id -- and inode, since fstat just returns 0 for both. -- See also Note [RTS File locking] (unique_dev, unique_ino) <- getUniqueFileInfo fd dev ino r <- lockFile (fromIntegral fd) unique_dev unique_ino (fromBool write) when (r == -1) $ ioException (IOError Nothing ResourceBusy "openFile" "file is locked" Nothing Nothing) ...

This suggests that when the file in question is opened there's already a read lock in for the same dev/ino. Perhaps the Github filesystem fails to ensure uniqueness of dev+ino of open files (perhaps when open files are already unlinked)?

-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Harendra Kumar

1:17 p.m.

Some more information: The root file system which contains the /tmp directory, where the file is being created, is of type ext4: df -T output: Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/root ext4 76026616 53915404 22094828 71% / cat /proc/mounts output: /dev/root / ext4 rw,relatime,discard,errors=remount-ro 0 0 Also, I added the directory and file existence checks before the creation of the file at all places, and now the problem stopped happening. Maybe it became less likely and might surface some time later. -harendra On Tue, 8 Oct 2024 at 18:08, Harendra Kumar wrote:

...

What if we closed a file and created another one and the inode of the previous file got reused for the new one? Is it possible that there is a small window after the deletion of the old one in which GHC keeps the lock in its hash table? If that happens then the newly created file will see that there is already a lock on the file. Could it be that the lock gets released when the handle is cleaned by GC or something like that?

I can try adding a delay and/or performMajorGC before creating the new file.

-harendra

On Tue, 8 Oct 2024 at 15:57, Viktor Dukhovni wrote:

...
On Tue, Oct 08, 2024 at 01:15:40PM +0530, Harendra Kumar wrote:

...
On Tue, 8 Oct 2024 at 11:50, Viktor Dukhovni wrote:

...
What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in?

This happens on github Linux CI. Not sure which filesystem they are using. Earlier I was wondering if something funny is happening in case they are using NFS. But NFS usually causes issues due to caching of directory entries if we are doing cross-node operations, here we are on a single node and operations are not running in parallel (or that's what I believe). I will remove the hspec layer from the tests to make sure that the code is simpler and our understanding is correct.

I will also run the tests on circle-ci to check if the problem occurs there. I have never seen this problem in testing this on a Linux machine on AWS even if I ran the tests for days in a loop.

Looking more closely at the GHC code, we see that there's an internal (RTS not OS level) exclusive lock on the (device, inode) pair as part of opening a Unix file for writes, or shared lock for reads.

rts/FileLock.c: int lockFile(StgWord64 id, StgWord64 dev, StgWord64 ino, int for_writing) { Lock key, *lock;

ACQUIRE_LOCK(&file_lock_mutex);

key.device = dev; key.inode = ino;

lock = lookupHashTable_(obj_hash, (StgWord)&key, hashLock, cmpLocks);

if (lock == NULL) { lock = stgMallocBytes(sizeof(Lock), "lockFile"); lock->device = dev; lock->inode = ino; lock->readers = for_writing ? -1 : 1; insertHashTable_(obj_hash, (StgWord)lock, (void *)lock, hashLock); insertHashTable(key_hash, id, lock); RELEASE_LOCK(&file_lock_mutex); return 0; } else { // single-writer/multi-reader locking: if (for_writing || lock->readers < 0) { RELEASE_LOCK(&file_lock_mutex); return -1; } insertHashTable(key_hash, id, lock); lock->readers++; RELEASE_LOCK(&file_lock_mutex); return 0; } }

This is obtained in "libraries/base/GHC/IO/FD.hs", via:

mkFD fd iomode mb_stat is_socket is_nonblock = do ... case fd_type of Directory -> ioException (IOError Nothing InappropriateType "openFile" "is a directory" Nothing Nothing)

-- regular files need to be locked RegularFile -> do -- On Windows we need an additional call to get a unique device id -- and inode, since fstat just returns 0 for both. -- See also Note [RTS File locking] (unique_dev, unique_ino) <- getUniqueFileInfo fd dev ino r <- lockFile (fromIntegral fd) unique_dev unique_ino (fromBool write) when (r == -1) $ ioException (IOError Nothing ResourceBusy "openFile" "file is locked" Nothing Nothing) ...

This suggests that when the file in question is opened there's already a read lock in for the same dev/ino. Perhaps the Github filesystem fails to ensure uniqueness of dev+ino of open files (perhaps when open files are already unlinked)?

-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Viktor Dukhovni

9 Oct 9 Oct

3:04 a.m.

On Tue, Oct 08, 2024 at 06:08:52PM +0530, Harendra Kumar wrote:

...

What if we closed a file and created another one and the inode of the previous file got reused for the new one? Is it possible that there is a small window after the deletion of the old one in which GHC keeps the lock in its hash table?

That SHOULD NOT happen, GHC releases the (internal hash table entry) lock before closing the file descriptor: close :: FD -> IO () close fd = do let closer realFd = throwErrnoIfMinus1Retry_ "GHC.IO.FD.close" $ #if defined(mingw32_HOST_OS) if fdIsSocket fd then c_closesocket (fromIntegral realFd) else #endif c_close (fromIntegral realFd) -- release the lock *first*, because otherwise if we're preempted -- after closing but before releasing, the FD may have been reused. -- (#7646) release fd closeFdWith closer (fromIntegral (fdFD fd)) release :: FD -> IO () release fd = do _ <- unlockFile (fromIntegral $ fdFD fd) return () Solved in GHC 7.8 11 years ago: https://gitlab.haskell.org/ghc/ghc/-/issues/7646#note_68902 This assumes that the application is not closing the file descriptor "behind GHC's back". That is, you're not using the POSIX package to directly close file descriptors underlying Haskell file Handles (which would then orphan the associated "lock"). -- Viktor.

Harendra Kumar

4:54 a.m.

I just noticed that cabal seems to be running test suites in parallel. We have two test suites. Even though each test suite generates the temp names randomly, they use the same prefix, if the generated names have a possibility of conflict due to PRNG it may cause a problem. That is perhaps the more likely cause rather than hunting this in GHC. cabal running tests in parallel without explicitly saying so came as a surprise to me. In fact I found an issue in cabal repo asking for a "feature" to run them sequentially, the issue is still open - https://github.com/haskell/cabal/issues/6751 . Hopefully this is it. -harendra On Wed, 9 Oct 2024 at 08:34, Viktor Dukhovni wrote:

...

On Tue, Oct 08, 2024 at 06:08:52PM +0530, Harendra Kumar wrote:

...
What if we closed a file and created another one and the inode of the previous file got reused for the new one? Is it possible that there is a small window after the deletion of the old one in which GHC keeps the lock in its hash table?

That SHOULD NOT happen, GHC releases the (internal hash table entry) lock before closing the file descriptor:

close :: FD -> IO () close fd = do let closer realFd = throwErrnoIfMinus1Retry_ "GHC.IO.FD.close" $ #if defined(mingw32_HOST_OS) if fdIsSocket fd then c_closesocket (fromIntegral realFd) else #endif c_close (fromIntegral realFd)

-- release the lock *first*, because otherwise if we're preempted -- after closing but before releasing, the FD may have been reused. -- (#7646) release fd

closeFdWith closer (fromIntegral (fdFD fd))

release :: FD -> IO () release fd = do _ <- unlockFile (fromIntegral $ fdFD fd) return ()

Solved in GHC 7.8 11 years ago:

https://gitlab.haskell.org/ghc/ghc/-/issues/7646#note_68902

This assumes that the application is not closing the file descriptor "behind GHC's back". That is, you're not using the POSIX package to directly close file descriptors underlying Haskell file Handles (which would then orphan the associated "lock").

-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Viktor Dukhovni

5:01 a.m.

On Wed, Oct 09, 2024 at 10:24:30AM +0530, Harendra Kumar wrote:

...

I just noticed that cabal seems to be running test suites in parallel. We have two test suites. Even though each test suite generates the temp names randomly, they use the same prefix, if the generated names have a possibility of conflict due to PRNG it may cause a problem. That is perhaps the more likely cause rather than hunting this in GHC. cabal running tests in parallel without explicitly saying so came as a surprise to me. In fact I found an issue in cabal repo asking for a "feature" to run them sequentially, the issue is still open - https://github.com/haskell/cabal/issues/6751 . Hopefully this is it.

Just parallel execution is not sufficient to explain the observed problem, you still need to have the same inode/dev already open in the same process, or bookkeeping of which dev/ino pairs are in use to be incorrect. So either the Github filesystem is reusing inodes of already deleted, but still open files (a deviation from expected Unix behaviour), or somehow GHC fails to correctly track the dev/ino pairs of open handles. My best guess is that something is manipulating file descriptors directly, bypassing the Handle layer, and *then* parallel execution could exacerbate the resulting inconsistent state. -- Viktor.

Harendra Kumar

5:58 a.m.

You are right I guess. I changed the temp prefix for both the test suites and the issue is still happening. Let me try creating a minimal test. The problem is - changing the code affects the reproducibility. -harendra On Wed, 9 Oct 2024 at 10:31, Viktor Dukhovni wrote:

...

On Wed, Oct 09, 2024 at 10:24:30AM +0530, Harendra Kumar wrote:

...
I just noticed that cabal seems to be running test suites in parallel. We have two test suites. Even though each test suite generates the temp names randomly, they use the same prefix, if the generated names have a possibility of conflict due to PRNG it may cause a problem. That is perhaps the more likely cause rather than hunting this in GHC. cabal running tests in parallel without explicitly saying so came as a surprise to me. In fact I found an issue in cabal repo asking for a "feature" to run them sequentially, the issue is still open - https://github.com/haskell/cabal/issues/6751 . Hopefully this is it.

Just parallel execution is not sufficient to explain the observed problem, you still need to have the same inode/dev already open in the same process, or bookkeeping of which dev/ino pairs are in use to be incorrect.

So either the Github filesystem is reusing inodes of already deleted, but still open files (a deviation from expected Unix behaviour), or somehow GHC fails to correctly track the dev/ino pairs of open handles.

My best guess is that something is manipulating file descriptors directly, bypassing the Handle layer, and *then* parallel execution could exacerbate the resulting inconsistent state.

-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Harendra Kumar

6:45 a.m.

We do use low level C APIs and GHC APIs to create a Handle in the event watching module. But that is for the watch-root and not for the file that is experiencing this problem. So here is how it works. We have a top level directory which is watched for events using inotify. We first create this directory, this directory is opened using inotify_init which returns a C file descriptor. We then create a Handle from this fd, this Handle is used for watching inotify events. We are then creating a file inside this directory which is being watched while we are reading events from the parent directory. The resource-busy issue occurs when creating a file inside this directory. So we are not creating the Handle for the file in question in a non-standard manner, but the parent directory Handle is being created in that manner. I do not know if that somehow affects anything. Or if the fact that the directory is being watched using inotify makes any difference? The code for creating the watch Handle is here: https://github.com/composewell/streamly/blob/bbac52d9e09fa5ad760ab6ee5572c70... . Viktor, you may want to take a quick look at this to see if it can make any difference to the issue at hand. -harendra On Wed, 9 Oct 2024 at 10:31, Viktor Dukhovni wrote:

...

On Wed, Oct 09, 2024 at 10:24:30AM +0530, Harendra Kumar wrote:

...
I just noticed that cabal seems to be running test suites in parallel. We have two test suites. Even though each test suite generates the temp names randomly, they use the same prefix, if the generated names have a possibility of conflict due to PRNG it may cause a problem. That is perhaps the more likely cause rather than hunting this in GHC. cabal running tests in parallel without explicitly saying so came as a surprise to me. In fact I found an issue in cabal repo asking for a "feature" to run them sequentially, the issue is still open - https://github.com/haskell/cabal/issues/6751 . Hopefully this is it.

Just parallel execution is not sufficient to explain the observed problem, you still need to have the same inode/dev already open in the same process, or bookkeeping of which dev/ino pairs are in use to be incorrect.

So either the Github filesystem is reusing inodes of already deleted, but still open files (a deviation from expected Unix behaviour), or somehow GHC fails to correctly track the dev/ino pairs of open handles.

My best guess is that something is manipulating file descriptors directly, bypassing the Handle layer, and *then* parallel execution could exacerbate the resulting inconsistent state.

-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Viktor Dukhovni

11:02 p.m.

On Wed, Oct 09, 2024 at 12:15:32PM +0530, Harendra Kumar wrote:

...

We do use low level C APIs and GHC APIs to create a Handle in the event watching module. But that is for the watch-root and not for the file that is experiencing this problem. So here is how it works. We have a top level directory which is watched for events using inotify. We first create this directory, this directory is opened using inotify_init which returns a C file descriptor. We then create a Handle from this fd, this Handle is used for watching inotify events. We are then creating a file inside this directory which is being watched while we are reading events from the parent directory. The resource-busy issue occurs when creating a file inside this directory. So we are not creating the Handle for the file in question in a non-standard manner, but the parent directory Handle is being created in that manner. I do not know if that somehow affects anything. Or if the fact that the directory is being watched using inotify makes any difference?

The code for creating the watch Handle is here: https://github.com/composewell/streamly/blob/bbac52d9e09fa5ad760ab6ee5572c70... . Viktor, you may want to take a quick look at this to see if it can make any difference to the issue at hand.

I don't have the cycles to isolate the problem. I still suspect that your code is somehow directly closing file descriptors associated with a Handle. This then orphans the associated logical reader/writer lock, which then gets inherited by the next incarnation of the same (dev, ino) pair. However, if the filesystem underlying "/tmp" were actually "tmpfs", inode reuse would be quite unlikely, because tmpfs inodes are assigned from a strictly incrementing counter: $ for i in {1..10}; do touch /tmp/foobar; ls -i /tmp/foobar; rm /tmp/foobar; done 3830 /tmp/foobar 3831 /tmp/foobar 3832 /tmp/foobar 3833 /tmp/foobar 3834 /tmp/foobar 3835 /tmp/foobar 3836 /tmp/foobar 3837 /tmp/foobar 3838 /tmp/foobar 3839 /tmp/foobar but IIRC you mentioned that on Github "/tmp" is ext4, not "tmpfs" (perhaps RAM-backed storage is a more scarce resource), in which case indeed inode reuse is quite likely: $ for i in {1..10}; do touch /var/tmp/foobar; ls -i /var/tmp/foobar; rm /var/tmp/foobar; done 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar But since normal open/close of Handles acquires the lock after open, and releases it before close, the evidence points to a bypass of the normal open file lifecycle. Your codebase contains a bunch of custom file management logic, which could be the source the of problem. To find the problem code path, you'd probably need to instrument the RTS lock/unlock code to log its activity: (mode, descriptor, dev, ino) tuples being added and removed. And strace execution to be able to identify descriptor open and close events. Ideally the problem will be reproducible even with strace. Good luck. -- Viktor.

Harendra Kumar

15 Nov 15 Nov

1:15 p.m.

Coming back to this issue after a break. I reviewed the code carefully and I cannot find anything where we are doing something in the application code that affects the RTS locking mechanism. Let me walk through the steps of the test up to failure and what we are doing in the code. The test output is like this: Ensuring dir: /tmp/fsevent_common-3725879775a96dfe/watch-root createFile: Creating file: /tmp/fsevent_common-3725879775a96dfe/watch-root/file1 FileSystem.Event: /tmp/fsevent_common-3725879775a96dfe/watch-root/file1: openFile: resource busy (file is locked) These steps are done before we start the file watching tests. We first create a "watch-root" directory using createDirectory and then use openFile to create a file named "file1" in that directory which we just created, this openFile operation fails with "file is locked" error. These steps are executed serially. The directory path is unique, this path is never used before in this test or in any other tests. This is the first time we are creating this file. Now let me explain a bit about the custom file opening/closing logic in the program. In each test, first we create a watch fd using inotify_init, and create a Handle from it: watch_fd <- c_inotify_init Note that the watch_fd does not belong to an actual file on the file system, so locking is irrelevant for this fd. FWIW, we avoid the lockFile code path in this case when creating the Handle from fd. To watch a file or directory we add that file to the watch using something like: watch_descriptor <- c_inotify_add_watch watch_fd path flags Note that we never create a GHC Handle from watch_descriptor. To remove a file from the watch we do something like this: rm = c_inotify_rm_watch watch_fd watch_descriptor This closes the watch_descriptor returned by inotify_add_watch. In any of these operations we have not inserted anything in the GHC RTS locking table. In fact we are not even creating a Handle in any case other than the watch_fd which does not even represent a file on the file system and we skip the lockFile code when creating this Handle. After adding the file to the watch, we open the file, perform read/write operations on it using the regular GHC file operations, and then close the file. I am not sure if closing the watch_descriptor interferes in any way with GHC's file descriptor which is open for normal reading/writing. I assume the watch_descriptor returned by inotify_add_watch operation on the file and the regular fd in the GHC Handle returned by the "openFile" operation on the file are different. So our closing the watch_descriptor should not affect the removal of the lock entry by RTS when the file Handle is closed. Regarding, how to debug this - I have not been able to reproduce this on a local Linux machine, even when using ext4 as temp directory's file system. On the local system I can see that inodes are being reused all the time on ext4, but still the problem is not reproduced. I ran the tests for several days and did not get a single instance of the issue. The only way forward seems to be to create a GHC build which logs the lock table entry and removal, and then run the CI using that GHC build. Even in CIs I can reproduce the problem only on lucky days. -harendra On Thu, 10 Oct 2024 at 04:32, Viktor Dukhovni wrote:

...

On Wed, Oct 09, 2024 at 12:15:32PM +0530, Harendra Kumar wrote:

...
We do use low level C APIs and GHC APIs to create a Handle in the event watching module. But that is for the watch-root and not for the file that is experiencing this problem. So here is how it works. We have a top level directory which is watched for events using inotify. We first create this directory, this directory is opened using inotify_init which returns a C file descriptor. We then create a Handle from this fd, this Handle is used for watching inotify events. We are then creating a file inside this directory which is being watched while we are reading events from the parent directory. The resource-busy issue occurs when creating a file inside this directory. So we are not creating the Handle for the file in question in a non-standard manner, but the parent directory Handle is being created in that manner. I do not know if that somehow affects anything. Or if the fact that the directory is being watched using inotify makes any difference?

The code for creating the watch Handle is here: https://github.com/composewell/streamly/blob/bbac52d9e09fa5ad760ab6ee5572c70... . Viktor, you may want to take a quick look at this to see if it can make any difference to the issue at hand.

I don't have the cycles to isolate the problem. I still suspect that your code is somehow directly closing file descriptors associated with a Handle. This then orphans the associated logical reader/writer lock, which then gets inherited by the next incarnation of the same (dev, ino) pair. However, if the filesystem underlying "/tmp" were actually "tmpfs", inode reuse would be quite unlikely, because tmpfs inodes are assigned from a strictly incrementing counter:

$ for i in {1..10}; do touch /tmp/foobar; ls -i /tmp/foobar; rm /tmp/foobar; done 3830 /tmp/foobar 3831 /tmp/foobar 3832 /tmp/foobar 3833 /tmp/foobar 3834 /tmp/foobar 3835 /tmp/foobar 3836 /tmp/foobar 3837 /tmp/foobar 3838 /tmp/foobar 3839 /tmp/foobar

but IIRC you mentioned that on Github "/tmp" is ext4, not "tmpfs" (perhaps RAM-backed storage is a more scarce resource), in which case indeed inode reuse is quite likely:

$ for i in {1..10}; do touch /var/tmp/foobar; ls -i /var/tmp/foobar; rm /var/tmp/foobar; done 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar

But since normal open/close of Handles acquires the lock after open, and releases it before close, the evidence points to a bypass of the normal open file lifecycle.

Your codebase contains a bunch of custom file management logic, which could be the source the of problem. To find the problem code path, you'd probably need to instrument the RTS lock/unlock code to log its activity: (mode, descriptor, dev, ino) tuples being added and removed. And strace execution to be able to identify descriptor open and close events. Ideally the problem will be reproducible even with strace.

Good luck.

-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Viktor Dukhovni

16 Nov 16 Nov

7:32 a.m.

On Fri, Nov 15, 2024 at 06:45:40PM +0530, Harendra Kumar wrote:

...

Coming back to this issue after a break. I reviewed the code carefully and I cannot find anything where we are doing something in the application code that affects the RTS locking mechanism. Let me walk through the steps of the test up to failure and what we are doing in the code. The test output is like this:

It is indeed not immediately clear where in your code or in some dependency (including base, GHC, ...) a descriptor that contributes to the RTS file reader/writer count (indexed by (dev, ino)) might be closed without adjusting the count by calling the RTS `unlockFile()` function (via GHC.IO.FD.release). It may be worth noting that GHC does not *reliably* prevent simultaneous open handles for the same underlying file, because handles returned by hDuplicate do not contribute to the count: demo.hs: import GHC.IO.Handle (hDuplicate) import System.IO main :: IO () main = do fh1 <- dupOpen "/tmp/foo" fh2 <- dupOpen "/tmp/foo" writeNow fh1 "abc\n" writeNow fh2 "def\n" readFile "/tmp/foo" >>= putStr hClose fh1 hClose fh2 where -- Look Mom, no lock! dupOpen path = do fh <- openFile path WriteMode hDuplicate fh <* hClose fh writeNow fh s = hPutStr fh s >> hFlush fh $ ghc -O demo.hs [1 of 2] Compiling Main ( demo.hs, demo.o ) [2 of 2] Linking demo $ ./demo def I am not sure that Haskell really should be holding the application's hand in this area, corrupting output files by concurrent writers can just as easily happen by running two independent processes. But letting go of this guardrail would IIRC be a deviation from the Haskell report, and there are likely applications that depend on this (and don't use hDupicate or equivalent to break the reader/writer tracking). -- Viktor.

Tom Smeding

8 Oct 8 Oct

7:06 a.m.

Inspired by Viktor's latest email in the thread:

...

Any pointers or more debug info to generate to debug this?

Given that this happens on Linux, it would be nice to have `strace` output on the program. If you know precisely which executable does this, `strace -o log.txt <the normal command to start that program>` suffices; additionally passing `-f` to strace instructs it to also trace child processes, at the cost of (usually) significantly blowing up the size of the trace. Then we can see exactly which system call fails, and what happened right before. - Tom On 07/10/2024 04:55, Harendra Kumar wrote:

...

Hi everyone,

We have a long standing, elusive and intermittent test failure in streamly's file system event test suite. The code where the error emanates from looks like this:

createFileWithParent :: FilePath -> FilePath -> IO () createFileWithParent file parent = do let filepath = parent file let dir = takeDirectory filepath putStrLn $ "createFileWithParent: file [" ++ file ++ "] dir [" ++ dir ++ "]" putStrLn $ "Ensuring dir: " ++ dir createDirectoryIfMissing True dir r <- doesDirectoryExist dir if r then do putStrLn $ "Ensured dir: " ++ dir when (not (null file)) $ do exists <- doesFileExist filepath if not exists then do putStrLn $ "Creating file: " ++ (parent file) openFile (parent file) WriteMode >>= hClose putStrLn $ "Created file: " ++ (parent file) else error $ "File exists: " ++ filepath else error $ "Could not create dir: " ++ dir

The important thing in the code above is that we check that the file does not exist already, and we are creating it. Creating this new file intermittently fails with the following error messages, all of these are coming from prints in the above:

createFileWithParent: file [file1] dir [/tmp/fsevent_dir-e1098325dc2b0880/watch-root] Ensuring dir: /tmp/fsevent_dir-e1098325dc2b0880/watch-root Ensured dir: /tmp/fsevent_dir-e1098325dc2b0880/watch-root Creating file: /tmp/fsevent_dir-e1098325dc2b0880/watch-root/file1

uncaught exception: IOException of type ResourceBusy /tmp/fsevent_dir-e1098325dc2b0880/watch-root/file1: openFile: resource busy (file is locked)

How can a file that does not exist and being created be locked during creation? We see this only on Linux, the same tests always succeed on Windows and macOS. We are facing this in GHC 9.2.8, I am not sure if it is version specific because in the past I selectively disabled the tests for specific GHC versions and the error still surfaced in other GHC versions.

Anything obvious that we may be missing here? Any pointers or more debug info to generate to debug this? I have not started looking into the openFile code yet, hoping someone else knowing it well might help here.

Is this some other "resource busy" error which is being wrongly bucketed into "file is locked" error?

Thanks, Harendra _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

228

Age (days ago)

268

Last active (days ago)

List overview

Download

18 comments

4 participants

participants (4)

Harendra Kumar
Kim-Ee Yeoh
Tom Smeding
Viktor Dukhovni

openFile gives "file is locked" error on Linux when creating a non-existing file

tags

participants (4)