fcntl locks, executeFile and threaded runtime

Hi, I'm having problems with executeFile as it seems to clear the advisory locks using the threaded runtime. Consider the following snippet (a simplification of what I'm doing): import System.IO import Control.Monad import System.Posix.IO import Control.Concurrent import System.Posix.Files import System.Posix.Process main = do let lock = (WriteLock, AbsoluteSeek, 0, 0) fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True} pid <- forkProcess $ do setLock fd lock >> putStrLn "child: ok" executeFile "/usr/bin/sleep" False ["5"] Nothing threadDelay $ 1 * 1000 * 1000 setLock fd lock >> putStrLn "parent: fail!" void $ getProcessStatus True False pid Then I consistentlty get these results: $ ghc -threaded --make test.hs; ./test child: ok parent: fail! $ ghc -rtsopts --make test.hs; ./test child: ok test: setLock: resource exhausted (Resource temporarily unavailable) Any pointers? At first I though it might be an issue with the unix package but that doesn't seem to be the case. $ ghc-pkg list | grep unix unix-2.6.0.1 $ ./test +RTS --info [("GHC RTS", "YES") ,("GHC version", "7.6.3") ,("RTS way", "rts_thr") ,("Build platform", "x86_64-unknown-linux") ,("Build architecture", "x86_64") ,("Build OS", "linux") ,("Build vendor", "unknown") ,("Host platform", "x86_64-unknown-linux") ,("Host architecture", "x86_64") ,("Host OS", "linux") ,("Host vendor", "unknown") ,("Target platform", "x86_64-unknown-linux") ,("Target architecture", "x86_64") ,("Target OS", "linux") ,("Target vendor", "unknown") ,("Word size", "64") ,("Compiler unregisterised", "NO") ,("Tables next to code", "YES") ] Thanks! ~dsouza

diego souza
I'm having problems with executeFile as it seems to clear the advisory locks using the threaded runtime.
I'm stumped, and unfortunately can't duplicate it here (no surprise as I have a different platform and GHC version.) But in case it helps ... your fcntl(2) file lock will be lost if your process closes any fd open on that file. So if the threaded runtime for some reason were to dup random fds and then close them, around a fork, that would do it. You might be able to pick that up in an strace (or whatever your platform utility for system call tracing.) But I don't see how executeFile could make any difference, in that scenario. Donn
Consider the following snippet (a simplification of what I'm doing):
import System.IO import Control.Monad import System.Posix.IO import Control.Concurrent import System.Posix.Files import System.Posix.Process
main = do let lock = (WriteLock, AbsoluteSeek, 0, 0) fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True} pid <- forkProcess $ do setLock fd lock >> putStrLn "child: ok" executeFile "/usr/bin/sleep" False ["5"] Nothing threadDelay $ 1 * 1000 * 1000 setLock fd lock >> putStrLn "parent: fail!" void $ getProcessStatus True False pid
Then I consistentlty get these results:
$ ghc -threaded --make test.hs; ./test child: ok parent: fail!
$ ghc -rtsopts --make test.hs; ./test child: ok test: setLock: resource exhausted (Resource temporarily unavailable)
Any pointers? At first I though it might be an issue with the unix package but that doesn't seem to be the case.
$ ghc-pkg list | grep unix unix-2.6.0.1
$ ./test +RTS --info [("GHC RTS", "YES") ,("GHC version", "7.6.3") ,("RTS way", "rts_thr") ,("Build platform", "x86_64-unknown-linux") ,("Build architecture", "x86_64") ,("Build OS", "linux") ,("Build vendor", "unknown") ,("Host platform", "x86_64-unknown-linux") ,("Host architecture", "x86_64") ,("Host OS", "linux") ,("Host vendor", "unknown") ,("Target platform", "x86_64-unknown-linux") ,("Target architecture", "x86_64") ,("Target OS", "linux") ,("Target vendor", "unknown") ,("Word size", "64") ,("Compiler unregisterised", "NO") ,("Tables next to code", "YES") ]
Thanks! ~dsouza _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave
But I don't see how executeFile could make any difference, in that scenario.
Look for fcntl(fd, FD_CLOEXEC, 1) calls? -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Fri, Oct 25, 2013 at 1:20 PM, Donn Cave
On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave
wrote: But I don't see how executeFile could make any difference, in that scenario.
Look for fcntl(fd, FD_CLOEXEC, 1) calls?
Oh, that would be heinous!
It would be because I got that completely wrong. fcntl(fd, F_SETFD, FD_CLOEXEC). sigh. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Yeah, it was my first thought too, but I didn't see anything like this in the strace output. What I do see, though, are two additional forks when using -threaded that seems that die early. This could very well explain why I'm loosing the lock. But then, why this only happens using executeFile? Thanks! ~dsouza ghc -rtsopts --make test.hs; strace -f -e trace=fork,fcntl,dup,dup2,close -e signal=\!SIGVTALRM ./test >/dev/null close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 Process 11591 attached [pid 11591] fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 [pid 11591] close(4) = 0 [pid 11591] close(4) = 0 [pid 11591] close(4) = 0 [pid 11590] fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable) test: setLock: resource exhausted (Resource temporarily unavailable) [pid 11590] +++ exited with 1 +++ close(1) = 0 close(2) = 0 +++ exited with 0 +++ ghc -threaded -rtsopts --make test.hs; strace -f -e trace=fork,fcntl,dup,dup2,close -e signal=\!SIGVTALRM ./tes[23/96461] ll Linking test ... close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 fcntl(5, F_GETFL) = 0x1 (flags O_WRONLY) fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 Process 11610 attached [pid 11609] fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) [pid 11609] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 11609] fcntl(6, F_SETFD, FD_CLOEXEC) = 0 Process 11611 attached Process 11612 attached [pid 11612] close(3) = 0 [pid 11612] close(4) = 0 [pid 11612] close(5) = 0 [pid 11612] close(6) = 0 [pid 11612] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 11612] fcntl(5, F_GETFL) = 0x1 (flags O_WRONLY) [pid 11612] fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 [pid 11612] fcntl(4, F_SETFD, FD_CLOEXEC) = 0 [pid 11612] fcntl(5, F_SETFD, FD_CLOEXEC) = 0 Process 11613 attached [pid 11612] fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) [pid 11612] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 11612] fcntl(6, F_SETFD, FD_CLOEXEC) = 0 [pid 11612] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 [pid 11613] +++ exited with 0 +++ [pid 11612] close(3) = 0 [pid 11612] close(3) = 0 [pid 11612] close(3) = 0 [pid 11609] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 [pid 11612] close(1) = 0 [pid 11612] close(2) = 0 [pid 11612] +++ exited with 0 +++ [pid 11609] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11612, si_status=0, si_utime=0, si_stime=0} --- [pid 11610] close(3) = 0 [pid 11610] close(4) = 0 [pid 11610] close(5) = 0 [pid 11610] close(6) = 0 [pid 11610] +++ exited with 0 +++ [pid 11611] +++ exited with 0 +++ +++ exited with 0 +++ At Fri, 25 Oct 2013 13:28:23 -0400, Brandon Allbery wrote:
[1
] [1.1 ] [1.2
] On Fri, Oct 25, 2013 at 1:20 PM, Donn Cave wrote: > > On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave
wrote: > >> But I don't see how executeFile could >> make any difference, in that scenario. > > Look for fcntl(fd, FD_CLOEXEC, 1) calls? Oh, that would be heinous!
It would be because I got that completely wrong. fcntl(fd, F_SETFD, FD_CLOEXEC). sigh.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
[2
] _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Oct 25, 2013 at 7:49 PM, diego souza
Yeah, it was my first thought too, but I didn't see anything like this in the strace output.
What I do see, though, are two additional forks when using -threaded that seems that die early. This could very well explain why I'm loosing the lock.
If this is Linux then you also want to track clone() calls. It's possible, depending on Linux kernel and/or glibc version, that you are seeing threads. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Good catch! Tomorrow (I'm too sleepy to do this right now) I'm going to try it out with different ghc versions as well. I'll let you know about my findings. Thanks! ~dsouza
If this is Linux then you also want to track clone() calls. It's possible, depending on Linux kernel and/or glibc version, that you are seeing threads.

Howdy, I've tested the previous program with all versions down to 6.12.3 and I've got the same results. Then I tried something different: main = do let lock = (WriteLock, AbsoluteSeek, 0, 0) fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True} setLock fd lock >> putStrLn "parent: locked!" pid <- forkProcess $ do setLock fd lock >> putStrLn "child: locked!" executeFile "/usr/bin/sleep" False ["5"] Nothing void $ getProcessStatus True False pid Which, always works as it supposes to: child process always fail to acquire the lock. The following one is quite interesting, though. The moment I insert the threadDelay function (like in the previous example), it fails some times (it seems to have something to do with cpu idleness). I guess this this explains why the previous version didn't work properly for me: main = do let lock = (WriteLock, AbsoluteSeek, 0, 0) fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True} pid0 <- forkProcess $ do setLock fd lock >> putStrLn "child0: locked!" executeFile "/usr/bin/sleep" False ["5"] Nothing pid1 <- forkProcess $ do setLock fd lock >> putStrLn "child1: locked!" executeFile "/usr/bin/sleep" False ["5"] Nothing threadDelay $ 1 * 1000 * 1000 -- take out this line and everything works mapM_ (getProcessStatus True False) [pid1, pid2] $ ghc -threaded -fforce-recomp --make -O2 ~/test; for _ in `seq 1 10`; do ~/test; echo; done; [1 of 1] Compiling Main ( /home/dsouza/test.hs, /home/dsouza/test.o ) Linking /home/dsouza/test ... parent: locked! test: setLock: resource exhausted (Resource temporarily unavailable) parent: locked! child: locked! child: locked! test: setLock: resource exhausted (Resource temporarily unavailable) child: locked! test: setLock: resource exhausted (Resource temporarily unavailable) parent: locked! test: setLock: resource exhausted (Resource temporarily unavailable) parent: locked! child: locked! child: locked! test: setLock: resource exhausted (Resource temporarily unavailable) parent: locked! child: locked! parent: locked! child: locked! child: locked! parent: locked! Am I doing something wrong? Thanks! ~dsouza

Sorry, I've sent the wrong snippet. This is the correct one: main = do let lock = (WriteLock, AbsoluteSeek, 0, 0) fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True} pid1 <- forkProcess $ do setLock fd lock >> putStrLn "child: locked!" executeFile "/usr/bin/sleep" False ["5"] Nothing pid2 <- forkProcess $ do setLock fd lock >> putStrLn "parent: locked!" executeFile "/usr/bin/sleep" False ["5"] Nothing threadDelay $ 1 * 1000 * 1000 mapM_ (getProcessStatus True False) [pid1, pid2] ~dsouza

Quoth diego souza, ...
The following one is quite interesting, though. The moment I insert the threadDelay function (like in the previous example), it fails some times (it seems to have something to do with cpu idleness). I guess this this explains why the previous version didn't work properly for me:
For me, this example kind is of ambiguous. I thought it seemed clear enough from your earlier results, that executeFile played some role in the problem, but in this example the two locking forks are parallel, and it's entirely possible that both lock syscalls will complete before either executeFile has finished or even begun, so ... unless I'm missing something (again!) I guess I would say this calls for a lot more tests to verify that you have this problem only with executeFile, and not with, say, a Haskell fork that does the same thing (sleep and exit.) By the way, I haven't been able to duplicate the problem with 6.12.3 on MacOS. Donn

For me, this example kind is of ambiguous. I thought it seemed clear enough from your earlier results, that executeFile played some role in the problem, but in this example the two locking forks are parallel, and it's entirely possible that both lock syscalls will complete before either executeFile has finished or even begun, so ... unless I'm missing something (again!) I guess I would say this calls for a lot more tests to verify that you have this problem only with executeFile, and not with, say, a Haskell fork that does the same thing (sleep and exit.)
By the way, I haven't been able to duplicate the problem with 6.12.3 on MacOS.
I ruled out executeFile as creating the lock prior forking makes the problem vanish. So I though it must be something else, which led me to the second example. I don't think it is related with threadDelay, actually I'm thinking it must something to do with the threaded runtime. I guess what you are missing is that fcntl locks are atomic and per process, so that: * forkIO or forkOS are no replacement for this (they don't create new processes, just threads); * setLock before executeFile is fine (fcntl locks are atomic), as long as the process does not terminates, or, sleep must be running so that the lock continues; I found no evidence that sleep is terminating early (but I'll double check), and I'm certain that the two locks took place. But I can make these tests better. I don't know why you can't reproduce this on MacOS. I have tried it in another linux machine (same architecture, different kernel, libc) and got pretty much the same results. At any rate, if you have a better way to reproduce or rule out the problem, let me know. But I'll keep digging on this. Posix locks are hard to work with. Flock is much better, which is what I'm using now*.* * * Thanks! ~dsouza

quoth Diego Souza, ...
I ruled out executeFile as creating the lock prior forking makes the problem vanish.
Well, yes, but you'd expect this if there's a problem with executeFile, wouldn't you? Because here both locks are attempted prior to executeFile, so it's kind of out of the picture. It might be interesting to use a `sleeplock' as below that accepts a FD parameter and attempts to lock it, and exec that from your Haskell main program. Then you can verify (I think) that if you use that in your initial configuration (where the parent locks second), it will always work when the exec'd program does the lock, but maybe fail when it's done prior to the exec (same program with no FD parameter.)
Posix locks are hard to work with. Flock is much better, which is what I'm using now
For sure, I'll go along with that. The operation of posix file locks should be down to the kernel/filesystem/etc., though, so it seems to me, if it's GHC's fault, the runtime must be doing something different to this fd at the syscall level. That's a fairly narrow set of possibilities. If we rule that out, and it really is something about thread scheduling etc., then it would have to be a Linux bug, wouldn't it? Donn ------------ import System.IO import System.Posix.IO import System.Posix.Files import System.Environment (getArgs) import System.Posix.Unistd (sleep) possiblyLock [] = return () possiblyLock (a:_) = do setLock (read a) (WriteLock, AbsoluteSeek, 0, 0) putStrLn "exec lock OK" main = do args <- getArgs possiblyLock (tail args) sleep (read (head args)) putStrLn "child waking up!"

Well, yes, but you'd expect this if there's a problem with executeFile, wouldn't you? Because here both locks are attempted prior to executeFile, so it's kind of out of the picture.
Right, it does make a lot sense. :-)
It might be interesting to use a `sleeplock' as below that accepts a FD parameter and attempts to lock it, and exec that from your Haskell main program. Then you can verify (I think) that if you use that in your initial configuration (where the parent locks second), it will always work when the exec'd program does the lock, but maybe fail when it's done prior to the exec (same program with no FD parameter.)
Locking from the child *always* works, so I'll hope you guys trust me and I'm not including any code. I've done two versions of the same program, a simplified one. One in C and another in haskell. They just 'forkProcess', 'setLock' and 'execFile'. I have tested the haskell version in three different linuxs systems and one macosx box: * ubuntu (kernel 3.11/ghc 7.6.3) * archlinux (kernel 3.11/ghc 7.6.3); * debian (kernel 3.9/ghc 7.4.1); * macos mavericks/ghc 7.6.3; I can't reproduce the bug on macos. It never fails. The c version (attached file c_fcntl.c) using the test_fcntl.sh (also attached), as expected, never fails: $ gcc -o c_fcntl c_fcntl.c $ sh test_fcntl.sh ./c_fcntl 2>/dev/null ./c_fcntl: ok: 1; fail: 0 ./c_fcntl: ok: 2; fail: 0 ./c_fcntl: ok: 3; fail: 0 ./c_fcntl: ok: 4; fail: 0 ./c_fcntl: ok: 5; fail: 0 ./c_fcntl: ok: 6; fail: 0 ./c_fcntl: ok: 7; fail: 0 ./c_fcntl: ok: 8; fail: 0 ./c_fcntl: ok: 9; fail: 0 ./c_fcntl: ok: 10; fail: 0 ./c_fcntl: ok: 11; fail: 0 Now the haskell version (attached file hs_fcntl.hs), I'm including only one output, but it is pretty much the same on all machines: $ ghc -threaded hs_fcntl.hs $ sh test_fcntl.sh ./hs_fcntl 2>/dev/null ./hs_fcntl: ok: 0; fail: 1 ./hs_fcntl: ok: 0; fail: 2 ./hs_fcntl: ok: 0; fail: 3 ./hs_fcntl: ok: 0; fail: 4 ./hs_fcntl: ok: 0; fail: 5 ./hs_fcntl: ok: 0; fail: 6 ./hs_fcntl: ok: 0; fail: 7 ./hs_fcntl: ok: 0; fail: 8 ./hs_fcntl: ok: 0; fail: 9 ./hs_fcntl: ok: 0; fail: 10 ./hs_fcntl: ok: 0; fail: 11 Now this outcome seem pretty easy to reproduce on linux systems, I guess. If you guys could try it on some linux machine I would appreciate [or let me know if you don't think this is a valid test]. Regarding the previous program, specially the on that did two 'forkProcess', I could see something from the strace output, but I can't relate to anything documented or that I'm aware of. On my machine, it seems that if the 'execve' happens before the second 'fcntl' the lock fails (as must be happening in the hs_fcntl.hs). I have tried a number of times and it really seems to be the case. It only works when the two 'fcntl' happens before any 'execve'. But I can't say for sure this is the case in other systems. The only thing that happens during an 'execve' that comes to mind is that it kills all threads but the current one. And I do see this happening on the trace output. But I guess this should make no difference (unless it seems it does). Well, right now I'm pretty much without pointers. :-) Thanks! ~dsouza

And for the record, the non-threaded runtime never fails (at least I've never seen once a failure): $ ghc hs_fcntl.hs $ sh test_fcntl.sh ./hs_fcntl 2>/dev/null ./hs_fcntl: ok: 1; fail: 0 ./hs_fcntl: ok: 2; fail: 0 ./hs_fcntl: ok: 3; fail: 0 ./hs_fcntl: ok: 4; fail: 0 ./hs_fcntl: ok: 5; fail: 0 ./hs_fcntl: ok: 6; fail: 0 ./hs_fcntl: ok: 7; fail: 0 ./hs_fcntl: ok: 8; fail: 0 ./hs_fcntl: ok: 9; fail: 0 ./hs_fcntl: ok: 10; fail: 0 ./hs_fcntl: ok: 11; fail: 0 Thanks! ~dsouza

Quoth diego souza, ...
The only thing that happens during an 'execve' that comes to mind is that it kills all threads but the current one. And I do see this happening on the trace output.
But I guess this should make no difference (unless it seems it does).
It could, if 1) with Linux "clone" threads, the file lock is a property of only the thread that acquired the lock, and 2) the thread that survives execve is not that one. Donn

While I'm grasping at straws, might as well mention that the threaded runtime uses signals, lots of signals, and that can break things that are interruptible and haven't been adequately signal-proofed. For example, earlier in this exchange I included a short program that uses System.Posix.Unistd.sleep, and on MacOS anyway, that breaks with -threaded -- the sleep doesn't sleep for any appreciable time before it gets interrupted by the flood of runtime ALRM signals. I can't account for any obvious reason why signal interrupts could cause the present problem, but it's easy enough to test if you're curious - just compile with -rtsopts, and pass the -V0 flag to the runtime. (E.e.g., ./a.out +RTS -V0 -RTS, GHCRTS=-V0 ./a.out, ...) Donn

Question about the syscall trace -- in the second, threaded version,
[pid 11612] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 ... [pid 11612] +++ exited with 0 +++
?? In the not threaded version, I don't see the child process exit - and wasn't expecting to, since it's supposed to have exec'd to /usr/bin/sleep. Donn ------------
ghc -rtsopts --make test.hs; strace -f -e trace=fork,fcntl,dup,dup2,close -e signal=\!SIGVTALRM ./test >/dev/null close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 Process 11591 attached [pid 11591] fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 [pid 11591] close(4) = 0 [pid 11591] close(4) = 0 [pid 11591] close(4) = 0 [pid 11590] fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable) test: setLock: resource exhausted (Resource temporarily unavailable) [pid 11590] +++ exited with 1 +++ close(1) = 0 close(2) = 0 +++ exited with 0 +++
ghc -threaded -rtsopts --make test.hs; strace -f -e trace=fork,fcntl,dup,dup2,close -e signal=\!SIGVTALRM ./tes[23/96461] ll Linking test ... close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 close(3) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 fcntl(5, F_GETFL) = 0x1 (flags O_WRONLY) fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 Process 11610 attached [pid 11609] fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) [pid 11609] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 11609] fcntl(6, F_SETFD, FD_CLOEXEC) = 0 Process 11611 attached Process 11612 attached [pid 11612] close(3) = 0 [pid 11612] close(4) = 0 [pid 11612] close(5) = 0 [pid 11612] close(6) = 0 [pid 11612] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 11612] fcntl(5, F_GETFL) = 0x1 (flags O_WRONLY) [pid 11612] fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 [pid 11612] fcntl(4, F_SETFD, FD_CLOEXEC) = 0 [pid 11612] fcntl(5, F_SETFD, FD_CLOEXEC) = 0 Process 11613 attached [pid 11612] fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) [pid 11612] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 11612] fcntl(6, F_SETFD, FD_CLOEXEC) = 0 [pid 11612] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 [pid 11613] +++ exited with 0 +++ [pid 11612] close(3) = 0 [pid 11612] close(3) = 0 [pid 11612] close(3) = 0 [pid 11609] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 [pid 11612] close(1) = 0 [pid 11612] close(2) = 0 [pid 11612] +++ exited with 0 +++ [pid 11609] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11612, si_status=0, si_utime=0, si_stime=0} --- [pid 11610] close(3) = 0 [pid 11610] close(4) = 0 [pid 11610] close(5) = 0 [pid 11610] close(6) = 0 [pid 11610] +++ exited with 0 +++ [pid 11611] +++ exited with 0 +++ +++ exited with 0 +++
At Fri, 25 Oct 2013 13:28:23 -0400, Brandon Allbery wrote:
[1
] [1.1 ] [1.2
] On Fri, Oct 25, 2013 at 1:20 PM, Donn Cave wrote: > > On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave
wrote: > >> But I don't see how executeFile could >> make any difference, in that scenario. > > Look for fcntl(fd, FD_CLOEXEC, 1) calls? Oh, that would be heinous!
It would be because I got that completely wrong. fcntl(fd, F_SETFD, FD_CLOEXEC). sigh.
-- brandon s allbery kf8nh                sine nomine associates allbery.b@gmail.com                 Âballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad    Âhttp://sinenomine.net
[2
] _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, Oct 26, 2013 at 11:38 AM, Donn Cave
Question about the syscall trace -- in the second, threaded version,
[pid 11612] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 ... [pid 11612] +++ exited with 0 +++
?? In the not threaded version, I don't see the child process exit - and wasn't expecting to, since it's supposed to have exec'd to /usr/bin/sleep.
You don't see it in the non-threaded one because the parent throws an exception while setting the lock and exits first. The sleep is only 5 seconds, so the second one reaches waitForProcess and collects it (note that the "parent: fail!" is a putStrLn, not an error) before exiting. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Hi Donn, Thanks for your response. Here is more info, first what I'm trying to accomplish. The program I'm writing is a kind of supervisor (like daemontools), something to daemonize a process then report the state on zookeeper which is why I need the threaded runtime. That said, I'm aware of the limitations. But for this particular case I believe it is going to work. I was using the posix locks only to test whether or not the program I'm supervising is still alive, as they have the nice feature of returning the pid that actually holds the lock. Before writing to haskell-cafe@ I've been "stracing" this program trying to make some sense of it, with no luck. I'm could attach the results, but there is nothing there that could explain this behavior. Well, at least I could not figure it out. The reason that makes me believe that executeFile is the culpirt is that without that line it works. If you take that out and use 'threadDelay' for instance, the problem is gone. Ah, and for the mean time I'm using flock. This has no problems, which makes sense as the locks is per file handle not per process as the posix ones. ~dsouza At Fri, 25 Oct 2013 08:52:38 -0700 (PDT), Donn Cave wrote:
diego souza
, I'm having problems with executeFile as it seems to clear the advisory locks using the threaded runtime.
I'm stumped, and unfortunately can't duplicate it here (no surprise as I have a different platform and GHC version.) But in case it helps ... your fcntl(2) file lock will be lost if your process closes any fd open on that file. So if the threaded runtime for some reason were to dup random fds and then close them, around a fork, that would do it. You might be able to pick that up in an strace (or whatever your platform utility for system call tracing.) But I don't see how executeFile could make any difference, in that scenario.
Donn
Consider the following snippet (a simplification of what I'm doing):
import System.IO import Control.Monad import System.Posix.IO import Control.Concurrent import System.Posix.Files import System.Posix.Process
main = do let lock = (WriteLock, AbsoluteSeek, 0, 0) fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True} pid <- forkProcess $ do setLock fd lock >> putStrLn "child: ok" executeFile "/usr/bin/sleep" False ["5"] Nothing threadDelay $ 1 * 1000 * 1000 setLock fd lock >> putStrLn "parent: fail!" void $ getProcessStatus True False pid
Then I consistentlty get these results:
$ ghc -threaded --make test.hs; ./test child: ok parent: fail!
$ ghc -rtsopts --make test.hs; ./test child: ok test: setLock: resource exhausted (Resource temporarily unavailable)
Any pointers? At first I though it might be an issue with the unix package but that doesn't seem to be the case.
$ ghc-pkg list | grep unix unix-2.6.0.1
$ ./test +RTS --info [("GHC RTS", "YES") ,("GHC version", "7.6.3") ,("RTS way", "rts_thr") ,("Build platform", "x86_64-unknown-linux") ,("Build architecture", "x86_64") ,("Build OS", "linux") ,("Build vendor", "unknown") ,("Host platform", "x86_64-unknown-linux") ,("Host architecture", "x86_64") ,("Host OS", "linux") ,("Host vendor", "unknown") ,("Target platform", "x86_64-unknown-linux") ,("Target architecture", "x86_64") ,("Target OS", "linux") ,("Target vendor", "unknown") ,("Word size", "64") ,("Compiler unregisterised", "NO") ,("Tables next to code", "YES") ]
Thanks! ~dsouza _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
Brandon Allbery
-
diego souza
-
Diego Souza
-
Donn Cave