
Hi Heinrich, On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus < apfelmus@quantentunnel.de> wrote:
My next experiment will be to find ways to express "take this operation and apply it to a stream without letting the stream leak". One implication is that gzReadFilePS should not be used outside of a core set of modules which have been auideted to be resource concious. Another implication is that we need to be really careful about wether or not we allow returning of sequences of patches. Possibly, we need several foldl-like functions
Jason Dagit wrote: that
open the stream internally. For example, to process the pending maybe we should have: withPending :: (a -> Patch -> a) -> IO a
And withPending would start the streaming and make sure that the stream cannot be visible as a data dependency outside of withPending.
Just a small comment on a potential flaw in this scheme and the observation that even the rank-2 type trick from the ST s monad wouldn't help.
I would say it does help, but it doesn't make it perfect.
Namely, withPending does not guarantee that the stream does not leak, it only makes it more natural/convenient to formulate one's code so that it doesn't leak. In particular, using (:) as argument pretty much defeats the whole purpose:
Right. And the iteratee library points out that your iteratees have to be well-behaved (I think there they say "bounded"). I'm well aware of this issue and thanks for pointing it out for others who are reading along.
withPending (flip (:))
Fortunately, the type system can ensure that the patches don't leak.
withPending :: (forall s. a -> Patch s -> a) -> IO a
Now, a may not mention s and the type checker will reject flip (:) as argument. See also
Oleg Kiselyov, Chung-chieh Shan. Lightweight Monadic Regions. http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf
for an elaboration of this technique.
I'm still on the fence as to whether this style of writing it will add value greater than the complexity it brings. I am certainly considering it :) The darcs source does other things that are also fairly complex.
However, the line between leaking and not leaking is very thin here. As soon as we are given for example a function
name :: Patch s -> String
that discards the s , its results can "leak", in the sense that we could now build a list of names
foo :: IO [String] foo = withPending . flip $ (:) . name
Even worse, any type a that doesn't have O(1) space usage will "leak"
bar :: IO [()] bar = withPending . flip $ const (() :)
In other words, exporting only a foldl' -like interface does not really prevent us from writing functions that have O(n) instead of O(1) space usage. But trying to rectify that with the forall s trick is a doomed idea, too.
I realize it's not perfect, but the problem we have now is that it's too easy to write things that have dismal space usage. If we can't force proper space usage, how can we make it more natural to have bounded space? Or at least a good approximation. It seems that: * foldl'-style helps * rank-n can help * no approach I've seen *forces* the behavior we want * existing code and bug reports demonstrate we need to improve the situation I'm open to suggestions on how to ensure the code has the space behavior I want. Lazy IO* and streams of patches is more compositional and natural to Haskell programmers, but it seems that it's too hard to ensure the code has reasonable space usage. At least where the darcs source is concerned. Therefore, I think the status quo demonstrates that in the darcs source it's worth experimenting with alternatives to lazy io and streams. In other words, the human effort to make the code behave how we want is currently too high and that's the issue I want to address. I don't know how we could make it impossible to have space leaks, although that would be interesting. (*) Note: Lazy IO itself is used in very few places in darcs these days because it has lead to serious bugs. These days me point is more about big streams getting retained. Finding where and why has proven difficult. Thanks, Jason

On Dec 12, 7:13 pm, Jason Dagit
Hi Heinrich,
On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus <
apfel...@quantentunnel.de> wrote:
My next experiment will be to find ways to express "take this operation and apply it to a stream without letting the stream leak". One implication is that gzReadFilePS should not be used outside of a core set of modules which have been auideted to be resource concious. Another implication is that we need to be really careful about wether or not we allow returning of sequences of patches. Possibly, we need several foldl-like functions
Jason Dagit wrote: that
open the stream internally. For example, to process the pending maybe we should have: withPending :: (a -> Patch -> a) -> IO a
And withPending would start the streaming and make sure that the stream cannot be visible as a data dependency outside of withPending.
Just a small comment on a potential flaw in this scheme and the observation that even the rank-2 type trick from the ST s monad wouldn't help.
I would say it does help, but it doesn't make it perfect.
Namely, withPending does not guarantee that the stream does not leak, it only makes it more natural/convenient to formulate one's code so that it doesn't leak. In particular, using (:) as argument pretty much defeats the whole purpose:
Right. And the iteratee library points out that your iteratees have to be well-behaved (I think there they say "bounded"). I'm well aware of this issue and thanks for pointing it out for others who are reading along.
withPending (flip (:))
Fortunately, the type system can ensure that the patches don't leak.
withPending :: (forall s. a -> Patch s -> a) -> IO a
Now, a may not mention s and the type checker will reject flip (:) as argument. See also
Oleg Kiselyov, Chung-chieh Shan. Lightweight Monadic Regions. http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf
for an elaboration of this technique.
I'm still on the fence as to whether this style of writing it will add value greater than the complexity it brings. I am certainly considering it :) The darcs source does other things that are also fairly complex.
However, the line between leaking and not leaking is very thin here. As soon as we are given for example a function
name :: Patch s -> String
that discards the s , its results can "leak", in the sense that we could now build a list of names
foo :: IO [String] foo = withPending . flip $ (:) . name
Even worse, any type a that doesn't have O(1) space usage will "leak"
bar :: IO [()] bar = withPending . flip $ const (() :)
In other words, exporting only a foldl' -like interface does not really prevent us from writing functions that have O(n) instead of O(1) space usage. But trying to rectify that with the forall s trick is a doomed idea, too.
I realize it's not perfect, but the problem we have now is that it's too easy to write things that have dismal space usage. If we can't force proper space usage, how can we make it more natural to have bounded space? Or at least a good approximation.
It seems that: * foldl'-style helps * rank-n can help * no approach I've seen *forces* the behavior we want * existing code and bug reports demonstrate we need to improve the situation
I'm open to suggestions on how to ensure the code has the space behavior I want. Lazy IO* and streams of patches is more compositional and natural to Haskell programmers, but it seems that it's too hard to ensure the code has reasonable space usage. At least where the darcs source is concerned. Therefore, I think the status quo demonstrates that in the darcs source it's worth experimenting with alternatives to lazy io and streams. In other words, the human effort to make the code behave how we want is currently too high and that's the issue I want to address. I don't know how we could make it impossible to have space leaks, although that would be interesting.
As a beginner to Haskell, I am only 1/3 through RWH, those lines scare me in a sense to question my effort. I simply can not distinguish if this discussion is somewhat pathological in a sense that every access to the outside world imposes dangers and an additional exception handler here and there and an additional if-statement to handle error return codes will suffice. Or lazy evaluation, IO monads and the whole story behind unsafePerformIO was an additional layer of self-deception and unpredictable effects from the outside world and lazy evaluation can NEVER be satisfactory handled.
(*) Note: Lazy IO itself is used in very few places in darcs these days because it has lead to serious bugs. These days me point is more about big streams getting retained. Finding where and why has proven difficult.
Thanks, Jason
_______________________________________________ Haskell-Cafe mailing list Haskell-C...@haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, Dec 12, 2009 at 1:37 PM, Johann Höchtl
Hi Heinrich,
On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus <
apfel...@quantentunnel.de> wrote:
Jason Dagit wrote:
My next experiment will be to find ways to express "take this operation and apply it to a stream without letting the stream leak". One implication is that gzReadFilePS should not be used outside of a core set of modules which have been auideted to be resource concious. Another implication is
need to be really careful about wether or not we allow returning of sequences of patches. Possibly, we need several foldl-like functions
we that
open the stream internally. For example, to process the pending maybe we should have: withPending :: (a -> Patch -> a) -> IO a
And withPending would start the streaming and make sure that the stream cannot be visible as a data dependency outside of withPending.
Just a small comment on a potential flaw in this scheme and the observation that even the rank-2 type trick from the ST s monad wouldn't help.
I would say it does help, but it doesn't make it perfect.
Namely, withPending does not guarantee that the stream does not leak, it only makes it more natural/convenient to formulate one's code so
it doesn't leak. In particular, using (:) as argument pretty much defeats the whole purpose:
Right. And the iteratee library points out that your iteratees have to be well-behaved (I think there they say "bounded"). I'm well aware of this issue and thanks for pointing it out for others who are reading along.
withPending (flip (:))
Fortunately, the type system can ensure that the patches don't leak.
withPending :: (forall s. a -> Patch s -> a) -> IO a
Now, a may not mention s and the type checker will reject flip (:) as argument. See also
Oleg Kiselyov, Chung-chieh Shan. Lightweight Monadic Regions. http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf http://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf
for an elaboration of this technique.
I'm still on the fence as to whether this style of writing it will add value greater than the complexity it brings. I am certainly considering it :) The darcs source does other things that are also fairly complex.
However, the line between leaking and not leaking is very thin here. As soon as we are given for example a function
name :: Patch s -> String
that discards the s , its results can "leak", in the sense that we could now build a list of names
foo :: IO [String] foo = withPending . flip $ (:) . name
Even worse, any type a that doesn't have O(1) space usage will "leak"
bar :: IO [()] bar = withPending . flip $ const (() :)
In other words, exporting only a foldl' -like interface does not really prevent us from writing functions that have O(n) instead of O(1) space usage. But trying to rectify that with the forall s trick is a doomed idea, too.
I realize it's not perfect, but the problem we have now is that it's too easy to write things that have dismal space usage. If we can't force
On Dec 12, 7:13 pm, Jason Dagit
wrote: that that proper space usage, how can we make it more natural to have bounded space? Or at least a good approximation.
It seems that: * foldl'-style helps * rank-n can help * no approach I've seen *forces* the behavior we want * existing code and bug reports demonstrate we need to improve the situation
I'm open to suggestions on how to ensure the code has the space behavior I want. Lazy IO* and streams of patches is more compositional and natural to Haskell programmers, but it seems that it's too hard to ensure the code has reasonable space usage. At least where the darcs source is concerned. Therefore, I think the status quo demonstrates that in the darcs source it's worth experimenting with alternatives to lazy io and streams. In other words, the human effort to make the code behave how we want is currently too high and that's the issue I want to address. I don't know how we could make it impossible to have space leaks, although that would be interesting.
As a beginner to Haskell, I am only 1/3 through RWH, those lines scare me in a sense to question my effort. I simply can not distinguish if this discussion is somewhat pathological in a sense that every access to the outside world imposes dangers and an additional exception handler here and there and an additional if-statement to handle error return codes will suffice.
We're not talking about exception handling :) And yes, Heinrich is talking about pathological cases.
Or lazy evaluation, IO monads and the whole story behind unsafePerformIO was an additional layer of self-deception and unpredictable effects from the outside world and lazy evaluation can NEVER be satisfactory handled.
We're not talking about unsafePerformIO either. The discussion at hand is much simpler. Given a large stream of data, how should you style your code so that it's easy to reason about the space usage? This is an important question in every programming language I've ever used. Also, IO need not enter the discussion except that in darcs the streams of data come from the disk. I think moving the discussion to haskell-cafe was a mistake. I said what I said in a very specific context (that of the darcs developers mailing list), which is likely no longer clear. Sorry for any distress this has caused you. Jason

Jason Dagit wrote:
Johann Höchtl wrote:
As a beginner to Haskell, I am only 1/3 through RWH, those lines scare me in a sense to question my effort. I simply can not distinguish if this discussion is somewhat pathological in a sense that every access to the outside world imposes dangers and an additional exception handler here and there and an additional if-statement to handle error return codes will suffice.
We're not talking about exception handling :) And yes, Heinrich is talking about pathological cases.
Or lazy evaluation, IO monads and the whole story behind unsafePerformIO was an additional layer of self-deception and unpredictable effects from the outside world and lazy evaluation can NEVER be satisfactory handled.
We're not talking about unsafePerformIO either.
The discussion at hand is much simpler. Given a large stream of data, how should you style your code so that it's easy to reason about the space usage? This is an important question in every programming language I've ever used. Also, IO need not enter the discussion except that in darcs the streams of data come from the disk. I think moving the discussion to haskell-cafe was a mistake. I said what I said in a very specific context (that of the darcs developers mailing list), which is likely no longer clear.
Sorry for any distress this has caused you.
Ah, I didn't mean to cause distress to anyone by cross-posting to the café. I just thought that a discussion "Is there a style of writing Haskell that makes it easy to ensure bounded space usage?" could be of general interest. Johann, don't worry, lazy evaluation is awesome. :) See for example http://apfelmus.nfshost.com/quicksearch.html It's just that laziness not a free lunch; predicting memory ("space") usage does become more difficult. Which is not really surprising since deferring the evaluation of thunks until they are demanded also means that they will store different expressions of possibly different sizes before and after they are demanded. Classic examples are ones = 1:ones taking O(1) memory instead of an infinite amount and foldl (+) 0 [1..n] taking O(n) memory as opposed to foldl' (+) 0 [1..n] which only takes O(1) memory. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Jason Dagit wrote:
withPending :: (a -> Patch -> a) -> IO a
And withPending would start the streaming and make sure that the stream cannot be visible as a data dependency outside of withPending.
[...]
Heinrich Apfelmus wrote:
In other words, exporting only a foldl' -like interface does not really prevent us from writing functions that have O(n) instead of O(1) space usage. But trying to rectify that with the forall s trick is a doomed idea, too.
I realize it's not perfect, but the problem we have now is that it's too easy to write things that have dismal space usage. If we can't force proper space usage, how can we make it more natural to have bounded space? Or at least a good approximation.
It seems that: * foldl'-style helps * rank-n can help * no approach I've seen *forces* the behavior we want * existing code and bug reports demonstrate we need to improve the situation
Yep, I agree completely; except for the rank-n (rank-2) trick where I'm more pessimistic. I think rank-2 works fine for things like ensuring that file handles are closed properly. But for the purpose of ensuring O(1) space usage, I think that withPending . flip $ const (() :) demonstrates that rank-2 is not worth the complexity they bring. Still, the idea of tracking resource usage in the type system is attractive.
I'm open to suggestions on how to ensure the code has the space behavior I want.
I've got another, unfinished idea:
Namely, withPending does not guarantee that the stream does not leak, it only makes it more natural/convenient to formulate one's code so that it doesn't leak. In particular, using (:) as argument pretty much defeats the whole purpose:
Right. And the iteratee library points out that your iteratees have to be well-behaved (I think there they say "bounded"). I'm well aware of this issue and thanks for pointing it out for others who are reading along.
How about tracking the requirement of "bounded" in the type system? In particular, I'm thinking of a type class class NFData a => Small a where the idea is that all types that can be stored in constant space are members of this class. For example, we have instance Small () instance Small Int instance Small Char instance (Small a, Small b) => Small (a,b) instance (Small a, Small b) => Small (Either a b) but recursive types like [a] or String are not allowed to be members. Then, withPending :: Small a => (a -> Patch -> a) -> IO a which is based on the function foldlSmall :: Small b => (b -> a -> b) -> b -> [a] -> b foldlSmall f b [] = foldlSmall f b (x:xs) = foldlSmall f b' xs where b' = rnf (f b x) is pretty much guaranteed to run in O(1) space. Well, that depends on f , but the point is it's not foldlSmall who introduces the space leak, it's the argument that takes all the blame.
In other words, the human effort to make the code behave how we want is currently too high and that's the issue I want to address. I don't know how we could make it impossible to have space leaks, although that would be interesting.
I opine that aiming for the latter (making space leaks impossible) while experimenting brings you closer to the actual goal (making them easy to avoid) when it comes to putting these experiments into practice. But that's just me. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Sun, Dec 13, 2009 at 3:47 AM, Heinrich Apfelmus < apfelmus@quantentunnel.de> wrote:
How about tracking the requirement of "bounded" in the type system? In particular, I'm thinking of a type class
class NFData a => Small a
where the idea is that all types that can be stored in constant space are members of this class. For example, we have
instance Small () instance Small Int instance Small Char instance (Small a, Small b) => Small (a,b) instance (Small a, Small b) => Small (Either a b)
but recursive types like [a] or String are not allowed to be members.
I like this. It's easy to audit for weird instances of Small. It seems like we also need: instance Small (IO ()) Which is not necessarily bounded due to things like IORefs. See below for why I think it would be a useful instance.
Then,
withPending :: Small a => (a -> Patch -> a) -> IO a
which is based on the function
foldlSmall :: Small b => (b -> a -> b) -> b -> [a] -> b foldlSmall f b [] = foldlSmall f b (x:xs) = foldlSmall f b' xs where b' = rnf (f b x)
is pretty much guaranteed to run in O(1) space. Well, that depends on f , but the point is it's not foldlSmall who introduces the space leak, it's the argument that takes all the blame.
I could also see, us needing this: bracketSmall :: (Small c) => IO a -> (a -> IO b) -> (a -> IO c) -> IO c I'm not sure if b needs to be Small, since it's just supposed to be the return value of the deallocation step. Actually, since we didn't (yet) make functions an instance of Small, we may need to say this type: bracketFoldSmall :: (Small c) => IO a -> (a -> IO b) -> ((c -> a -> c) -> c -> a -> c) -> IO c Here I'm imagining 'a', being something like a list of patches. It's a big stream we want to process. Furthermore, it seems like we probably want c = IO (). This would be useful if we wanted to do some output for each patch in the sequence. But, as I think about it, it seems like if we allow ST or IO in these "small" functions or as instances of Small then we lose our guarantees. Yet, it seems that having functions like bracketSmall are necessary if we want to hide the stream itself from being exposed outside of the traversal function. For example, your foldlSmall doesn't leak, but something at the same level of scope as it could leak the list. Jason

Jason Dagit wrote:
Heinrich Apfelmus wrote:
How about tracking the requirement of "bounded" in the type system? In particular, I'm thinking of a type class
class NFData a => Small a
where the idea is that all types that can be stored in constant space are members of this class. For example, we have
instance Small () instance Small Int instance Small Char instance (Small a, Small b) => Small (a,b) instance (Small a, Small b) => Small (Either a b)
but recursive types like [a] or String are not allowed to be members.
On second (and late) thought, this idea is not as good as it sounds and needs more thought. The thing is if A is an instance of Small , we are not guaranteed that a value x :: A will take little memory, we are only guaranteed that its *normal form* takes little memory. Now, due to pervasive lazy evaluation, it's quite impossible to track whether a value is in normal form in the type system.
It seems like we also need: instance Small (IO ())
Which is not necessarily bounded due to things like IORefs. [...]
In the light of the above, I think this is related to the issue that we can't evaluate IO () to normal form at all, i.e. there is no deepSeq for values of type IO () .
I could also see us needing this:
bracketSmall :: (Small c) => IO a -> (a -> IO b) -> (a -> IO c) -> IO c
I'm not sure if b needs to be Small, since it's just supposed to be the return value of the deallocation step.
Thanks to parametricity, we know that b is thrown away, it can never appear in the result type c .
It seems that having functions like bracketSmall are necessary if we want to hide the stream itself from being exposed outside of the traversal function. For example, your foldlSmall doesn't leak, but something at the same level of scope as it could leak the list.
Yep, though this does have the unfortunate (?) consequence that we cannot use bracketSmall to return any large values anymore, even if they do not leak the stream. (A notion which literally gets "muddier" in the course of the program because it might be that c does not leak the vanilla stream, but for example every character of it.) Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com
participants (3)
-
Heinrich Apfelmus
-
Jason Dagit
-
Johann Höchtl