
Dear cafe, how would you approach this task: find (enumerate) those x from a list xs for which some computation is not successful within some resource bound (it weeds out uninteresting data, leaving just the hard cases, which will be treated later by other means) Input and result could be lazy lists, computations are pure. If the bounded resource is time, then I can use System.Timeout. This puts me into IO, but OK. Perhaps I want to do some logging output anyways. The problem is with bounding space. Assume that "computation on x" (sometimes) allocates a lot. Then the whole program will just die with "heap exhausted", while in fact I want to terminate just the computation on this x, garbage-collect, and continue. I could make the space usage explicit: each step of the computation could additionally compute a number that approximates memory usage. (Assume that this usage varies wildly with each step.) Then I can stop iterating when this reaches some bound. Or, I just compile the computation into a separate executable, and I call it (for each x) via the operating system, because there I can bound space (with ulimit) Is there some way to achieve this in Haskell land? - J.W.

The problem is that the semantics of pure languages approximate the behavior of ideal machines (turing machines, lambda machines, etc.). Resource constraints are not first-class features of the language because analyzing resource (time and memory) usage of pure functions is inherently impure. The ideas you've outlined are pretty much what I would do as well. You can use IO to keep track of time and either IO or a custom pure monad that keeps an approximate memory usage count to keep track of memory. I'm not sure if you can place memory constraints on forkIO threads, but that would be better than using a separate process. Will
On Jan 20, 2016, at 08:53, Johannes Waldmann
wrote: Dear cafe, how would you approach this task:
find (enumerate) those x from a list xs for which some computation is not successful within some resource bound
(it weeds out uninteresting data, leaving just the hard cases, which will be treated later by other means) Input and result could be lazy lists, computations are pure.
If the bounded resource is time, then I can use System.Timeout. This puts me into IO, but OK. Perhaps I want to do some logging output anyways.
The problem is with bounding space. Assume that "computation on x" (sometimes) allocates a lot. Then the whole program will just die with "heap exhausted", while in fact I want to terminate just the computation on this x, garbage-collect, and continue.
I could make the space usage explicit: each step of the computation could additionally compute a number that approximates memory usage. (Assume that this usage varies wildly with each step.) Then I can stop iterating when this reaches some bound.
Or, I just compile the computation into a separate executable, and I call it (for each x) via the operating system, because there I can bound space (with ulimit)
Is there some way to achieve this in Haskell land?
- J.W. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

This could help with memory https://git.haskell.org/ghc.git/commitdiff/b0534f78a73f972e279eed4447a5687bd... I think that's the result of a Facebook contribution, they needed it to avoid a single unlikely request taking too much memory and slowing down everything else.

One thing you can try is access current garbage collector stats by using the GHC.Stats library. I don't know how that mingles with multiple threads and I don't think it's very precise, but it sounds simple enough. You just have to start GHC with stats enabled. Afterwards, you can check if they are enabled with getGCStatsEnabled and get a record of stats with getGCStats. Then experiment away with the entries of that record. Hope that helps. Am 20.01.2016 15:53 schrieb "Johannes Waldmann" < johannes.waldmann@htwk-leipzig.de>:
Dear cafe, how would you approach this task:
find (enumerate) those x from a list xs for which some computation is not successful within some resource bound
(it weeds out uninteresting data, leaving just the hard cases, which will be treated later by other means) Input and result could be lazy lists, computations are pure.
If the bounded resource is time, then I can use System.Timeout. This puts me into IO, but OK. Perhaps I want to do some logging output anyways.
The problem is with bounding space. Assume that "computation on x" (sometimes) allocates a lot. Then the whole program will just die with "heap exhausted", while in fact I want to terminate just the computation on this x, garbage-collect, and continue.
I could make the space usage explicit: each step of the computation could additionally compute a number that approximates memory usage. (Assume that this usage varies wildly with each step.) Then I can stop iterating when this reaches some bound.
Or, I just compile the computation into a separate executable, and I call it (for each x) via the operating system, because there I can bound space (with ulimit)
Is there some way to achieve this in Haskell land?
- J.W. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On 21/01/2016, at 3:53 am, Johannes Waldmann
Or, I just compile the computation into a separate executable, and I call it (for each x) via the operating system, because there I can bound space (with ulimit)
Just how big are the time and space limits you have in mind? You'd clearly rather *not* create separate processes, but if they are taking time enough and space enough, the overheads might not be worth worrying about, and there would be fewer problems to worry about. A typical problem: what if your estimate of allocation is wrong, and some task is cheerfully grinding away safely within its estimate but really far outside it, and takes your whole program down? With a separate process, that won't happen. Also, there are two different things. There is the amount of space the task has ever ALLOCATED (which is what's not *too* horrible to estimate) and the amount of space the task NEEDS right now (which is what ulimit will constrain). Estimating allocation may (will!) be pessimistic. Estimating need requires you to predict what the garbage collector is going to do and I don't see that as easy. With a separate process you can also stop worrying about estimating the space needs of code you didn't write.

Hello Johannes, We wrote a PLDI paper on precisely this topic! http://ezyang.com/rlimits.html The feature never got merged to GHC proper, however, because it required some changes to GHC's generated code which were a slight pessimization for people who were not planning on using resource limits (the loss was characterized in the paper), and it didn't seem right to add an entirely new way (ala profiling/dynamic/etc) to GHC just to support resource limits. Edward Excerpts from Johannes Waldmann's message of 2016-01-20 06:53:39 -0800:
Dear cafe, how would you approach this task:
find (enumerate) those x from a list xs for which some computation is not successful within some resource bound
(it weeds out uninteresting data, leaving just the hard cases, which will be treated later by other means) Input and result could be lazy lists, computations are pure.
If the bounded resource is time, then I can use System.Timeout. This puts me into IO, but OK. Perhaps I want to do some logging output anyways.
The problem is with bounding space. Assume that "computation on x" (sometimes) allocates a lot. Then the whole program will just die with "heap exhausted", while in fact I want to terminate just the computation on this x, garbage-collect, and continue.
I could make the space usage explicit: each step of the computation could additionally compute a number that approximates memory usage. (Assume that this usage varies wildly with each step.) Then I can stop iterating when this reaches some bound.
Or, I just compile the computation into a separate executable, and I call it (for each x) via the operating system, because there I can bound space (with ulimit)
Is there some way to achieve this in Haskell land?
- J.W.
participants (6)
-
Edward Z. Yang
-
insanemole .
-
Johannes Waldmann
-
Richard A. O'Keefe
-
Sebastián Galkin
-
Will Yager