
On Wed, Oct 09, 2024 at 12:15:32PM +0530, Harendra Kumar wrote:
We do use low level C APIs and GHC APIs to create a Handle in the event watching module. But that is for the watch-root and not for the file that is experiencing this problem. So here is how it works. We have a top level directory which is watched for events using inotify. We first create this directory, this directory is opened using inotify_init which returns a C file descriptor. We then create a Handle from this fd, this Handle is used for watching inotify events. We are then creating a file inside this directory which is being watched while we are reading events from the parent directory. The resource-busy issue occurs when creating a file inside this directory. So we are not creating the Handle for the file in question in a non-standard manner, but the parent directory Handle is being created in that manner. I do not know if that somehow affects anything. Or if the fact that the directory is being watched using inotify makes any difference?
The code for creating the watch Handle is here: https://github.com/composewell/streamly/blob/bbac52d9e09fa5ad760ab6ee5572c70... . Viktor, you may want to take a quick look at this to see if it can make any difference to the issue at hand.
I don't have the cycles to isolate the problem. I still suspect that your code is somehow directly closing file descriptors associated with a Handle. This then orphans the associated logical reader/writer lock, which then gets inherited by the next incarnation of the same (dev, ino) pair. However, if the filesystem underlying "/tmp" were actually "tmpfs", inode reuse would be quite unlikely, because tmpfs inodes are assigned from a strictly incrementing counter: $ for i in {1..10}; do touch /tmp/foobar; ls -i /tmp/foobar; rm /tmp/foobar; done 3830 /tmp/foobar 3831 /tmp/foobar 3832 /tmp/foobar 3833 /tmp/foobar 3834 /tmp/foobar 3835 /tmp/foobar 3836 /tmp/foobar 3837 /tmp/foobar 3838 /tmp/foobar 3839 /tmp/foobar but IIRC you mentioned that on Github "/tmp" is ext4, not "tmpfs" (perhaps RAM-backed storage is a more scarce resource), in which case indeed inode reuse is quite likely: $ for i in {1..10}; do touch /var/tmp/foobar; ls -i /var/tmp/foobar; rm /var/tmp/foobar; done 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar 25854141 /var/tmp/foobar 25854142 /var/tmp/foobar But since normal open/close of Handles acquires the lock after open, and releases it before close, the evidence points to a bypass of the normal open file lifecycle. Your codebase contains a bunch of custom file management logic, which could be the source the of problem. To find the problem code path, you'd probably need to instrument the RTS lock/unlock code to log its activity: (mode, descriptor, dev, ino) tuples being added and removed. And strace execution to be able to identify descriptor open and close events. Ideally the problem will be reproducible even with strace. Good luck. -- Viktor.