New subject: [darcs-users] Iteratees, streams, and mmap

12 Dec 2009

      Hi Heinrich,

On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus <
apfelmus@quantentunnel.de> wrote:
...
...
My next experiment will be to find ways to express "take this operation
and
apply it to a stream without letting the stream leak". One implication is
that gzReadFilePS should not be used outside of a core set of modules
which
have been auideted to be resource concious.  Another implication is that
we
need to be really careful about wether or not we allow returning of
sequences of patches.  Possibly, we need several foldl-like functions
Jason Dagit wrote:
that
...
open the stream internally.  For example, to process the pending maybe we
should have:
withPending :: (a -> Patch -> a) ->  IO a
And withPending would start the streaming and make sure that the stream
cannot be visible as a data dependency outside of withPending.
Just a small comment on a potential flaw in this scheme and the
observation that even the rank-2 type trick from the  ST s  monad
wouldn't help.
I would say it does help, but it doesn't make it perfect.
...
Namely,  withPending  does not guarantee that the stream does not leak,
it only makes it more natural/convenient to formulate one's code so that
it doesn't leak. In particular, using  (:)  as argument pretty much
defeats the whole purpose:
Right.  And the iteratee library points out that your iteratees have to be
well-behaved (I think there they say "bounded").  I'm well aware of this
issue and thanks for pointing it out for others who are reading along.
...
withPending (flip (:))
Fortunately, the type system can ensure that the patches don't leak.
withPending :: (forall s. a -> Patch s -> a) -> IO a
Now,  a  may not mention  s  and the type checker will reject  flip (:)
 as argument. See also
Oleg Kiselyov, Chung-chieh Shan.
 Lightweight Monadic Regions.
 http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf
for an elaboration of this technique.
I'm still on the fence as to whether this style of writing it will add value
greater than the complexity it brings.  I am certainly considering it :)
The darcs source does other things that are also fairly complex.
...
However, the line between leaking and not leaking is very thin here. As
soon as we are given for example a function
name :: Patch s -> String
that discards the  s , its results can "leak", in the sense that we
could now build a list of names
foo :: IO [String]
  foo = withPending . flip $ (:) . name
Even worse, any type  a  that doesn't have O(1) space usage will "leak"
bar :: IO [()]
  bar = withPending . flip $ const (() :)
In other words, exporting only a  foldl' -like interface does not really
prevent us from writing functions that have O(n) instead of O(1) space
usage. But trying to rectify that with the  forall s  trick is a doomed
idea, too.
I realize it's not perfect, but the problem we have now is that it's too
easy to write things that have dismal space usage.  If we can't force proper
space usage, how can we make it more natural to have bounded space?  Or at
least a good approximation.

It seems that:
 * foldl'-style helps
 * rank-n can help
 * no approach I've seen *forces* the behavior we want
 * existing code and bug reports demonstrate we need to improve the
situation

I'm open to suggestions on how to ensure the code has the space behavior I
want.  Lazy IO* and streams of patches is more compositional and natural to
Haskell programmers, but it seems that it's too hard to ensure the code has
reasonable space usage.  At least where the darcs source is concerned.
Therefore, I think the status quo demonstrates that in the darcs source it's
worth experimenting with alternatives to lazy io and streams.  In other
words, the human effort to make the code behave how we want is currently too
high and that's the issue I want to address.  I don't know how we could make
it impossible to have space leaks, although that would be interesting.

(*) Note: Lazy IO itself is used in very few places in darcs these days
because it has lead to serious bugs.  These days me point is more about big
streams getting retained.  Finding where and why has proven difficult.

Thanks,
Jason

Re: [darcs-users] Iteratees, streams, and mmap

Jason Dagit

Johann Höchtl

Jason Dagit

Heinrich Apfelmus

Heinrich Apfelmus

Jason Dagit

Heinrich Apfelmus

tags

participants (3)