
2009/4/22 Neil Mitchell
I've got a multi-threaded application which occasionally generates failures in openFile. I haven't been able to reproduce the errors reliably, the code is way too large to send over, and any small attempts at invoking the same problem don't seem to work. Despite the uselessness of the bug report, I thought I'd share what I've seen and how I fixed it.
I have many threads, which read and write files. Every so often one thread will write a file, then another thread will read the same file - but fail during the open call. There are locks to ensure that the write call finishes before the read call begins. I modified the code to give:
do print ("READ START",x) ; res <- readFile x ; print ("READ STOP",x) ; return res
do print ("WRITE START",x); writeFile x src ; print ("WRITE STOP",x)
I then get on the console:
WRITE START foo WRITE STOP foo READ START foo openFile doesn't have permission to open foo.
The writeFile/readFile are happening in different threads, and they usually succeed - but not always. The bug seems to go away when I add performGC just after writeFile. My guess is that something in the openFile/hClose pair isn't really closed until a garbage collection happens. All this is using GHC 6.10.2 on XP through Cygwin.
The hClose really does close the file descriptor. The only thing left is the finalizer, but it is just a no-op on an already-closed Handle. I can't think of anything we're doing that could possibly cause this, but I have seen rogue "permission denied" errors on Windows from time to time, they're quite annoying. Here's a possibly-related ticket: http://hackage.haskell.org/trac/ghc/ticket/2924 You might want to run the process under ProcMon and see if you can figure out what's going on (if you can bear to use ProcMon, it's a very poor replacement for strace IMO). Cheers, Simon