writeFile/readFile on multiple threads leads to error

Hi, I've got a multi-threaded application which occasionally generates failures in openFile. I haven't been able to reproduce the errors reliably, the code is way too large to send over, and any small attempts at invoking the same problem don't seem to work. Despite the uselessness of the bug report, I thought I'd share what I've seen and how I fixed it. I have many threads, which read and write files. Every so often one thread will write a file, then another thread will read the same file - but fail during the open call. There are locks to ensure that the write call finishes before the read call begins. I modified the code to give: do print ("READ START",x) ; res <- readFile x ; print ("READ STOP",x) ; return res do print ("WRITE START",x); writeFile x src ; print ("WRITE STOP",x) I then get on the console: WRITE START foo WRITE STOP foo READ START foo openFile doesn't have permission to open foo. The writeFile/readFile are happening in different threads, and they usually succeed - but not always. The bug seems to go away when I add performGC just after writeFile. My guess is that something in the openFile/hClose pair isn't really closed until a garbage collection happens. All this is using GHC 6.10.2 on XP through Cygwin. I'm happy to supply more details if you can think of anything that will help, Thanks Neil

Is it too difficult to try this on Linux or Mac, just to see if it shows up there as well? -- Jason Dusek

Hello Neil, Wednesday, April 22, 2009, 4:22:01 PM, you wrote: you can try to use POSIX open/read/write/close calls
Hi,
I've got a multi-threaded application which occasionally generates failures in openFile. I haven't been able to reproduce the errors reliably, the code is way too large to send over, and any small attempts at invoking the same problem don't seem to work. Despite the uselessness of the bug report, I thought I'd share what I've seen and how I fixed it.
I have many threads, which read and write files. Every so often one thread will write a file, then another thread will read the same file - but fail during the open call. There are locks to ensure that the write call finishes before the read call begins. I modified the code to give:
do print ("READ START",x) ; res <- readFile x ; print ("READ STOP",x) ; return res
do print ("WRITE START",x); writeFile x src ; print ("WRITE STOP",x)
I then get on the console:
WRITE START foo WRITE STOP foo READ START foo openFile doesn't have permission to open foo.
The writeFile/readFile are happening in different threads, and they usually succeed - but not always. The bug seems to go away when I add performGC just after writeFile. My guess is that something in the openFile/hClose pair isn't really closed until a garbage collection happens. All this is using GHC 6.10.2 on XP through Cygwin.
I'm happy to supply more details if you can think of anything that will help,
Thanks
Neil _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

do print ("READ START",x) ; res <- readFile x ; print ("READ STOP",x) ; return res
Unless you've defined your own version of 'readFile', to mean read entire file now, the first 'print' is optimistic and the second 'print' is a lie. Claus
do print ("WRITE START",x); writeFile x src ; print ("WRITE STOP",x)
I then get on the console:
WRITE START foo WRITE STOP foo READ START foo openFile doesn't have permission to open foo.
The writeFile/readFile are happening in different threads, and they usually succeed - but not always. The bug seems to go away when I add performGC just after writeFile. My guess is that something in the openFile/hClose pair isn't really closed until a garbage collection happens. All this is using GHC 6.10.2 on XP through Cygwin.
I'm happy to supply more details if you can think of anything that will help,
Thanks
Neil _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hi Claus,
do print ("READ START",x) ; res <- readFile x ; print ("READ STOP",x) ; return res
Unless you've defined your own version of 'readFile', to mean read entire file now, the first 'print' is optimistic and the second 'print' is a lie.
readFile calls openFile >>= hGetContents. It's the openFile that causes the problem, so READ START happens before openFile and READ STOP happens after openFile. The lazy semantics of the actual reading don't seem to have an effect. I did try changing to the strict bytestring file read, and that gave exactly the same error - apart from openBinaryFile was crashing rather than openFile. Thanks Neil
do print ("WRITE START",x); writeFile x src ; print ("WRITE STOP",x)
I then get on the console:
WRITE START foo WRITE STOP foo READ START foo openFile doesn't have permission to open foo.
The writeFile/readFile are happening in different threads, and they usually succeed - but not always. The bug seems to go away when I add performGC just after writeFile. My guess is that something in the openFile/hClose pair isn't really closed until a garbage collection happens. All this is using GHC 6.10.2 on XP through Cygwin.
I'm happy to supply more details if you can think of anything that will help,
Thanks
Neil _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

do print ("READ START",x) ; res <- readFile x ; print ("READ STOP",x) ; return res
Unless you've defined your own version of 'readFile', to mean read entire file now, the first 'print' is optimistic and the second 'print' is a lie.
readFile calls openFile >>= hGetContents. It's the openFile that causes the problem, so READ START happens before openFile and READ STOP happens after openFile. The lazy semantics of the actual reading don't seem to have an effect.
Just want to make sure that it isn't the latent hClose from readFile. Once upon a time, I spend a lot of time figuring out why nhc98 wouldn't work on windows, until I noticed that its code used to rely on unixy file system tricks like readFile followed by removeFile, followed by looking at the contents. main = do writeFile "data" "file contents" r <- readFile "data" writeFile "data" "different contents" print r *Main> main *** Exception: data: openFile: permission denied (Permission denied)
I did try changing to the strict bytestring file read, and that gave exactly the same error - apart from openBinaryFile was crashing rather than openFile.
Weren't there versions of bytestring that didn't close the file early enough? Just checking, Claus
Thanks
Neil
do print ("WRITE START",x); writeFile x src ; print ("WRITE STOP",x)
I then get on the console:
WRITE START foo WRITE STOP foo READ START foo openFile doesn't have permission to open foo.
The writeFile/readFile are happening in different threads, and they usually succeed - but not always. The bug seems to go away when I add performGC just after writeFile. My guess is that something in the openFile/hClose pair isn't really closed until a garbage collection happens. All this is using GHC 6.10.2 on XP through Cygwin.
I'm happy to supply more details if you can think of anything that will help,
Thanks
Neil _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Bulat: I haven't tried moving to Posix calls, I'll try that next - although I was hoping the application wouldn't be posix dependent.
readFile calls openFile >>= hGetContents. It's the openFile that causes the problem, so READ START happens before openFile and READ STOP happens after openFile. The lazy semantics of the actual reading don't seem to have an effect.
Just want to make sure that it isn't the latent hClose from readFile.
Yeah, I've done that before! It seems to be the writeFile that isn't closing, not the readFile - hence my surprise.
I did try changing to the strict bytestring file read, and that gave exactly the same error - apart from openBinaryFile was crashing rather than openFile.
Weren't there versions of bytestring that didn't close the file early enough?
I only used bytestring for reading, so it shouldn't have made any difference. Thanks Neil

2009/4/22 Neil Mitchell
I've got a multi-threaded application which occasionally generates failures in openFile. I haven't been able to reproduce the errors reliably, the code is way too large to send over, and any small attempts at invoking the same problem don't seem to work. Despite the uselessness of the bug report, I thought I'd share what I've seen and how I fixed it.
I have many threads, which read and write files. Every so often one thread will write a file, then another thread will read the same file - but fail during the open call. There are locks to ensure that the write call finishes before the read call begins. I modified the code to give:
do print ("READ START",x) ; res <- readFile x ; print ("READ STOP",x) ; return res
do print ("WRITE START",x); writeFile x src ; print ("WRITE STOP",x)
I then get on the console:
WRITE START foo WRITE STOP foo READ START foo openFile doesn't have permission to open foo.
The writeFile/readFile are happening in different threads, and they usually succeed - but not always. The bug seems to go away when I add performGC just after writeFile. My guess is that something in the openFile/hClose pair isn't really closed until a garbage collection happens. All this is using GHC 6.10.2 on XP through Cygwin.
The hClose really does close the file descriptor. The only thing left is the finalizer, but it is just a no-op on an already-closed Handle. I can't think of anything we're doing that could possibly cause this, but I have seen rogue "permission denied" errors on Windows from time to time, they're quite annoying. Here's a possibly-related ticket: http://hackage.haskell.org/trac/ghc/ticket/2924 You might want to run the process under ProcMon and see if you can figure out what's going on (if you can bear to use ProcMon, it's a very poor replacement for strace IMO). Cheers, Simon

I have many threads, which read and write files. Every so often one thread will write a file, then another thread will read the same file - but fail during the open call. There are locks to ensure that the write call finishes before the read call begins. I modified the code to give:
The writeFile/readFile are happening in different threads, and they usually succeed - but not always. The bug seems to go away when I add performGC just after writeFile. My guess is that something in the openFile/hClose pair isn't really closed until a garbage collection happens. All this is using GHC 6.10.2 on XP through Cygwin.
The hClose really does close the file descriptor. The only thing left is the finalizer, but it is just a no-op on an already-closed Handle.
I can't think of anything we're doing that could possibly cause this, but I have seen rogue "permission denied" errors on Windows from time to time, they're quite annoying. Here's a possibly-related ticket:
I've added information from this thread to that ticket. It looks suspiciously similar, and I have a feeling that some combination of writeFile/readFile in a similar pattern might invoke the same issue - it's certainly quite close to the behaviour of my application.
You might want to run the process under ProcMon and see if you can figure out what's going on (if you can bear to use ProcMon, it's a very poor replacement for strace IMO).
I tried and the bug goes away, plus the computer grinds to a crunching halt. The bug is very sensitive, and with the speed loss associated with ProcMon probably disappears on its own. No useful information can be obtained there. Thanks Neil

2009/4/22 Felix Martini
2009/4/22 Simon Marlow:
You might want to run the process under ProcMon and see if you can figure out what's going on (if you can bear to use ProcMon, it's a very poor replacement for strace IMO).
You could try StraceNT instead (not as good as strace though).
I like it! Thanks for the tip. Simon
participants (6)
-
Bulat Ziganshin
-
Claus Reinke
-
Felix Martini
-
Jason Dusek
-
Neil Mitchell
-
Simon Marlow