On Tue, Aug 10, 2010 at 5:54 PM, Brandon Simmons <brandon.m.simmons@gmail.com> wrote:
On Tue, Aug 10, 2010 at 4:34 PM, Jason Dagit <dagit@codersbase.com> wrote:
>
>
> On Mon, Aug 9, 2010 at 10:48 PM, Brandon Simmons
> <brandon.m.simmons@gmail.com> wrote:
>>
>> Greetings Haskellers!
>>
>> directory-tree is a module providing a directory-tree-like datatype
>> along with Foldable and Traversable instances, along with a simple,
>> high-level IO interface. You can see the package along with some
>> examples here (apologies if the haddock docs haven't been generated
>> yet) :
>>
>>    http://hackage.haskell.org/package/directory-tree
>
> If I understand what you're saying, then your library is very similar to an
> abstraction that darcs had for years knows as "Slurpy".  The experience in
> the darcs project was that it lead to performance issues and correctness
> issues that were hard to find/fix.
>>
>> This primary change in this release is the addition of two
>> experimental "lazy" functions: `readDirectoryWithL` and `buildL`.
>> These functions use `unsafePerformIO` behind the scenes to traverse
>> the filesystem as required by pure computations consuming the returned
>> DirTree data structure. I believe I am doing this safely and sanely
>> but would love if some more experienced folks could comment on the
>> code.
>
> unsafePerformIO or unsafeInterleaveIO?
> Either way, to me it seems a bit dangerous to be doing this sort of lazy IO.
>  If the directory structure is large will I run out of file handles?  How
> will IO errors be handled?  Will I receive the exceptions in pure code or
> inside my IO actions?  Will I run into space leaks if something holds on to
> 1 file and then references it "after" the directory traversal?  I might have
> my history wrong, but as I recall darcs started with lazy slurpies and moved
> to doing things strictly due to space leaks, running out of file
> descriptors, file descriptor leaks (not running out, but having the file be
> locked long after darcs should have been 'done' with it), and exception
> delivery.

IO Errors are caught in a pure constructor called "Failed". In
practice I think my unsafe version is better in many of those respects
than the original, for example with regard to running out of file
handles. Are you referring to lazy IO in general, which those problems
you mention seem to apply to, or the use of unsafePerformIO?

It boils down to the same thing right?
 

I certainly want this module to be as useful and problem-free as
possible, but I will be content if it is no less problematic than lazy
IO is problematic.

Could you elaborate on

   > "Will I run into space leaks if something holds on to1 file and
then references
   > it "after" the directory traversal"?


Let me give you an example.  Prelude's readFile is lazy.  That is, it returns immediately and then only fetches from the file as you demand the contents of the file.  This makes it possible to stream the file.  If you process it chunks, say 1 line at a time, then you can do so in constant space.

If you then let the contents of the file escape, meaning somewhere else in the processing references it, then you'll stop streaming it and start holding on to the whole thing at once.  Something like this, untested:

notleaky1 = do
  xs <- readFile "foo"
  mapM_ print (lines xs)

notleaky2 = do
  xs <- readFile "foo"
  print (length xs)

leaky = do
  xs <- readFile "foo"
  mapM_ print (lines xs)
  print (length xs) 

handleleak = do
  xs <- readFile "foo"
  return (take 10 xs)

Now, in leaky if you calculated the length and printed the lines in the same iteration, the leak would go away.  In the handleleak example the file stays open even after handleleak produces all 10 elements.

Now imagine those examples in terms of directory traversals instead of read from a file.

This would still be a problem even if replace readFile with readFile':
readFile' f = unsafePerformIO (readFile f)

I hope that helps,
Jason