Re: [Haskell-cafe] Lazy IO and closing of file handles

Ketil Malde wrote:
Bertram Felgenhauer wrote:
type Subject = String data Email = Email {from :: From, subject :: Subject} deriving Show data Email = Email {from :: !From, subject :: !Subject} deriving Show ...except that From and Subject are Strings, and thus the strictness annotation only forces WHNF. I.e., you also need to modify parseEmail to force these.
-k
You're right. Actually, the program will be strict enough if the From header always precedes the Subject header (which was the case in my tests), but that's not immediately obvious. Modifying getHeader to force its result is the clean solution, say: getHeader = forceString . fromMaybe "N/A" . flip lookup headers forceString s = length s `seq` s Having to rely on GC to close the fds quickly enough is another problem; can this be solved on the library side, maybe by performing GCs when running out of FDs? Bertram

Bertram Felgenhauer wrote:
Having to rely on GC to close the fds quickly enough is another problem; can this be solved on the library side, maybe by performing GCs when running out of FDs?
Claus Reinke wrote:
in good old Hugs, for instance, we find in function newHandle in src/iomonad.c [...snip...] /* Search for unused handle*/ /* If at first we don't */ /* succeed, garbage collect*/ /* and try again ... */ /* ... before we give up */
so, instead of documenting limitations and workarounds, this issue should be fixed in GHC as well.
This may help in some cases but it cannot be relied upon. Finalizers are always run in a separate thread (must be, see http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if you force a GC when handles are exhausted, as hugs seems to do, there is no guarantee that by the time the GC is done the finalizers have freed any handles (assuming that the GC run really detects any handles to be garbage). Cheers Ben

[trigger garbage collection when open runs out of free file descriptors, then try again]
so, instead of documenting limitations and workarounds, this issue should be fixed in GHC as well.
This may help in some cases but it cannot be relied upon. Finalizers are always run in a separate thread (must be, see http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if you force a GC when handles are exhausted, as hugs seems to do, there is no guarantee that by the time the GC is done the finalizers have freed any handles (assuming that the GC run really detects any handles to be garbage).
useful reference to collect!-) but even that mentions giving back os resources such as file descriptors as one of the simpler cases. running the GC/finalizers sequence repeatedly until nothing more changes might be worth thinking about, as are possible race conditions. here is the thread the paper is refering to as one of its origins: http://gcc.gnu.org/ml/java/2001-12/msg00113.html http://gcc.gnu.org/ml/java/2001-12/msg00390.html i also like the idea mentioned as one of the alternatives in 3.1, where the finalizer does not notify the object that is to become garbage, but a different manager object. in this case, one might notify the i/o handler, and that could take care of avoiding trouble. in my opinion, if my code or my finalizers hold on to resources i'd like to see freed, then i'm responsible, even if i might need language help to remedy the situation. but if i take care to avoid such references, and the system still runs out of resources just because it can't be bothered to check right now whether it has some left to free, there is nothing i can do about it (apart from complaining, that is!-). of course, this isn't new. see, for instance, this thread view: http://groups.google.com/group/fa.haskell/browse_thread/thread/2f1f855c8ba33a5/74d32070dbcc92fc?lnk=st&q=hugs+openFile+file+descriptor+garbage+collection&rnum=1#74d32070dbcc92fc where Remi Turk points out System.Mem.performGC, and Simon Marlow agrees that GHC should do more to free file descriptors, but also mentions that performGC doesn't run finalizers. actually, if i have readFile-based code that immediately processes the file contents before the next readFile, as in Matthew's test code, my ghci (on windows) doesn't seem to run out of file descriptors easily, but if i force a descriptor leak by leaving unreferenced contents unprocessed, then performGC does seem to help (not that this is ideal in general, as discussed in the thread above): import System.Environment import System.Mem import System.IO main = do n:f:_ <- getArgs (sequence (repeat (openFile f ReadMode)) >> return ()) `catch` (\_->return ()) test1 (take (read n) $ repeat f) test1 files = mapM_ doStuff files where doStuff f = {- performGC >> -} readFile f >>= print.map length.take 10.lines interestingly, if i do that, even Hugs seems to need the performGC? claus ps. one could even try to go further, and have virtual file descriptors, like virtual memory. but that is something for the os, i guess.

Benjamin Franksen wrote:
Bertram Felgenhauer wrote:
Having to rely on GC to close the fds quickly enough is another problem; can this be solved on the library side, maybe by performing GCs when running out of FDs?
Claus Reinke wrote:
in good old Hugs, for instance, we find in function newHandle in src/iomonad.c [...snip...] /* Search for unused handle*/ /* If at first we don't */ /* succeed, garbage collect*/ /* and try again ... */ /* ... before we give up */
so, instead of documenting limitations and workarounds, this issue should be fixed in GHC as well.
This may help in some cases but it cannot be relied upon. Finalizers are always run in a separate thread (must be, see http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if you force a GC when handles are exhausted, as hugs seems to do, there is no guarantee that by the time the GC is done the finalizers have freed any handles (assuming that the GC run really detects any handles to be garbage).
Sorry for replying to myself, but I just realized that the argument brought forth by Boehm applies only to general purpose finalizing facilites, and not necessarily to each and every special case. I think one could make up an argument that file handles in Haskell are indeed a special kind of object and that the language runtime /can/ run finalizers for file handles in a more 'synchronous' way (i.e. GC could call them directly as soon as it determines they are garbage). The main point here is that a file descriptor does not contain references to other language objects. The same would apply to all sorts of OS resource handles. However, the whole argument is a priori valid only for raw system handles, such as file descriptors. No idea what issues come up if one considers e.g. buffering, or more generally, any additional data structure that gets associated with the handle. Cheers Ben
participants (3)
-
Benjamin Franksen
-
Bertram Felgenhauer
-
Claus Reinke