What is the functional way of implementing a function that takes a long time to execute?

Hi Folks, Suppose a function takes a long time to do its work. Perhaps it takes minutes or even hours to complete. While it is crunching along, it would be nice to have some insight into its status such as (1) how close is it to completing? (2) what part of the task is it currently working on? It might even be nice to be notified when it is finished. What is the functional way of implementing the function? /Roger

"Costello, Roger L."
Suppose a function takes a long time to do its work.
Perhaps it takes minutes or even hours to complete.
While it is crunching along, it would be nice to have some insight into its status such as (1) how close is it to completing? (2) what part of the task is it currently working on?
It might even be nice to be notified when it is finished.
What is the functional way of implementing the function?
The traditional way in Haskell to see intermediate results is not to produce the final result only, but to produce a list of intermediate results, the last of which is the final result. The trick is to apply what we call corecursion, which basically means: Wrap the recursion in an (ideally nonstrict) data constructor cell. The function sqrtApprox :: Rational -> Rational then becomes: sqrtApprox :: Rational -> [Rational] You know that you have done it properly if you used proper corecursion, which looks similar to this: loop x y = (x, y) : loop x' y' where x' = {- ... -} y' = {- ... -} The important part is that the recursion is the right argument of the constructor (:) and that the constructor application is the last thing that happens. You can make sure that you have done it properly and even get some nice deforestation optimizations by using one of the predefined corecursion operators like 'unfoldr', 'iterate', etc. Sometimes you'll want to encode an algorithm even as a composition of predefined corecursive formulas like 'map', 'filter', 'tails', etc. For example you may have a list [1,2,3,4,5,6,7,8,9] to begin with and you want to encode the sum of the products of three consecutive values, 1*2*3 + 4*5*6 + 7*8*9 + 10: takeWhile (not . null) . map (take 3) . iterate (drop 3) Now how do we produce online statistics while this algorithm calculates its result? This is the easiest part, but there is a catch. You have a corecursively produced list, which ensures that the list is lazy enough. All you have to do now is to consume the list as part of an IO action. The foldM combinator is most helpful here: sumStats :: (Num a) => [a] -> IO a sumStats = foldM f 0 where f s x = do putStrLn ("Sum so far: " ++ s) return (s + x) As said there is a catch. In your recursive consumer you actually have to make sure that the intermediate result is actually calculated. This is important, because otherwise your consumer doesn't actually force the calculation, but really just builds up a large unevaluated expression, which is only evaluated at the very end. The easiest way to ensure this is to just do what I did in sumStats: Print the intermediate result (you may call it 'state') or some value derived from it (make sure that the value depends on the entire state). If you don't want to perform some IO action with the state you can also just be strict. There are many ways to be strict, my favorite being f s x = do {- ... -} return $! s + x but you can also use the BangPatterns extension. The basic idea of all this is that you turn an opaque monolithic formula into a stream processing formula. This not only makes it more flexible, but may also help you understand the original problem better and split it into independent modules. Remember that you can always compose stream processors using regular function composition as done above. Once you are comfortable with using lists for stream processing you can also use an actual stream processing abstraction. My personal favorite right now is the 'pipes' library, but there are other useful libraries including the other modern library 'conduit' as well as the traditional 'enumerator' library. This of course does not end with streams. Streams are for algorithms with a linear execution paths, but not every algorithm follows that. If your formula naturally follows multiple paths, there is nothing wrong with corecursively producing and recursively consuming trees or graphs, for example. You just need unfoldTree+foldTreeM or unfoldGraph+foldGraphM for that. This works with every algebraic data structure. As a final remark, don't worry about the performance. Haskell is a lazy language. Done properly at least the intermediate lists will be optimized away by the compiler and the resulting machine code is close to what a C compiler would have produced for the original monolithic formula randomly interspersed with statistics printing commands. Greets Ertugrul -- Not to be or to be and (not to be or to be and (not to be or to be and (not to be or to be and ... that is the list monad.

Hello Ertugrul,
Thank you for your detailed explanation. I am not sure that I understand all of its subtleties.
I am wondering if you would be willing to provide us with a simple, complete example that we can try out on our own machines?
/Roger
-----Original Message-----
From: beginners-bounces@haskell.org [mailto:beginners-bounces@haskell.org] On Behalf Of Ertugrul Söylemez
Sent: Tuesday, April 23, 2013 9:29 AM
To: beginners@haskell.org
Subject: Re: [Haskell-beginners] What is the functional way of implementing a function that takes a long time to execute?
"Costello, Roger L."
Suppose a function takes a long time to do its work.
Perhaps it takes minutes or even hours to complete.
While it is crunching along, it would be nice to have some insight into its status such as (1) how close is it to completing? (2) what part of the task is it currently working on?
It might even be nice to be notified when it is finished.
What is the functional way of implementing the function?
The traditional way in Haskell to see intermediate results is not to produce the final result only, but to produce a list of intermediate results, the last of which is the final result. The trick is to apply what we call corecursion, which basically means: Wrap the recursion in an (ideally nonstrict) data constructor cell. The function sqrtApprox :: Rational -> Rational then becomes: sqrtApprox :: Rational -> [Rational] You know that you have done it properly if you used proper corecursion, which looks similar to this: loop x y = (x, y) : loop x' y' where x' = {- ... -} y' = {- ... -} The important part is that the recursion is the right argument of the constructor (:) and that the constructor application is the last thing that happens. You can make sure that you have done it properly and even get some nice deforestation optimizations by using one of the predefined corecursion operators like 'unfoldr', 'iterate', etc. Sometimes you'll want to encode an algorithm even as a composition of predefined corecursive formulas like 'map', 'filter', 'tails', etc. For example you may have a list [1,2,3,4,5,6,7,8,9] to begin with and you want to encode the sum of the products of three consecutive values, 1*2*3 + 4*5*6 + 7*8*9 + 10: takeWhile (not . null) . map (take 3) . iterate (drop 3) Now how do we produce online statistics while this algorithm calculates its result? This is the easiest part, but there is a catch. You have a corecursively produced list, which ensures that the list is lazy enough. All you have to do now is to consume the list as part of an IO action. The foldM combinator is most helpful here: sumStats :: (Num a) => [a] -> IO a sumStats = foldM f 0 where f s x = do putStrLn ("Sum so far: " ++ s) return (s + x) As said there is a catch. In your recursive consumer you actually have to make sure that the intermediate result is actually calculated. This is important, because otherwise your consumer doesn't actually force the calculation, but really just builds up a large unevaluated expression, which is only evaluated at the very end. The easiest way to ensure this is to just do what I did in sumStats: Print the intermediate result (you may call it 'state') or some value derived from it (make sure that the value depends on the entire state). If you don't want to perform some IO action with the state you can also just be strict. There are many ways to be strict, my favorite being f s x = do {- ... -} return $! s + x but you can also use the BangPatterns extension. The basic idea of all this is that you turn an opaque monolithic formula into a stream processing formula. This not only makes it more flexible, but may also help you understand the original problem better and split it into independent modules. Remember that you can always compose stream processors using regular function composition as done above. Once you are comfortable with using lists for stream processing you can also use an actual stream processing abstraction. My personal favorite right now is the 'pipes' library, but there are other useful libraries including the other modern library 'conduit' as well as the traditional 'enumerator' library. This of course does not end with streams. Streams are for algorithms with a linear execution paths, but not every algorithm follows that. If your formula naturally follows multiple paths, there is nothing wrong with corecursively producing and recursively consuming trees or graphs, for example. You just need unfoldTree+foldTreeM or unfoldGraph+foldGraphM for that. This works with every algebraic data structure. As a final remark, don't worry about the performance. Haskell is a lazy language. Done properly at least the intermediate lists will be optimized away by the compiler and the resulting machine code is close to what a C compiler would have produced for the original monolithic formula randomly interspersed with statistics printing commands. Greets Ertugrul -- Not to be or to be and (not to be or to be and (not to be or to be and (not to be or to be and ... that is the list monad.

"Costello, Roger L."
Thank you for your detailed explanation. I am not sure that I understand all of its subtleties.
I am wondering if you would be willing to provide us with a simple, complete example that we can try out on our own machines?
I have written the Lucas-Lehmer primality test in various variants in the this paste: http://hpaste.org/86449 The gist of it is that for a prime p you generate a sequence, pick a particular element from it and compare it to zero. If it is zero, then 2^p - 1 is a prime number. In the paste the llSeq function is universal to both the pure algorithm (llTest) that only computes the result as well as the stats variant (llTestStats) that is an IO action and tells you how many iterations are left to go. For comparison I have also written a monolithic variant (llTestMono) that doesn't use the stream processing style and shows how you would implement the same algorithm in an imperative language. This is monolithic in that there is no flexibility here. They are all about equal in speed. Just compile and run with the argument 10007: ./lucas-lehmer 10007 This will test whether the number 2^10007 - 1 is prime. It should be finished in less than a second and conclude with False. The argument 11213 will conclude with True. Greets, Ertugrul -- Not to be or to be and (not to be or to be and (not to be or to be and (not to be or to be and ... that is the list monad.

Thank you Ertugrul for this awesome explanation. It's emails like that are the reason why I love this list. Bryce On 04/23/2013 06:28 AM, Ertugrul Söylemez wrote:
"Costello, Roger L."
wrote: Suppose a function takes a long time to do its work.
Perhaps it takes minutes or even hours to complete.
While it is crunching along, it would be nice to have some insight into its status such as (1) how close is it to completing? (2) what part of the task is it currently working on?
It might even be nice to be notified when it is finished.
What is the functional way of implementing the function? The traditional way in Haskell to see intermediate results is not to produce the final result only, but to produce a list of intermediate results, the last of which is the final result. The trick is to apply what we call corecursion, which basically means: Wrap the recursion in an (ideally nonstrict) data constructor cell. The function
sqrtApprox :: Rational -> Rational
then becomes:
sqrtApprox :: Rational -> [Rational]
You know that you have done it properly if you used proper corecursion, which looks similar to this:
loop x y = (x, y) : loop x' y' where x' = {- ... -} y' = {- ... -}
The important part is that the recursion is the right argument of the constructor (:) and that the constructor application is the last thing that happens. You can make sure that you have done it properly and even get some nice deforestation optimizations by using one of the predefined corecursion operators like 'unfoldr', 'iterate', etc.
Sometimes you'll want to encode an algorithm even as a composition of predefined corecursive formulas like 'map', 'filter', 'tails', etc. For example you may have a list [1,2,3,4,5,6,7,8,9] to begin with and you want to encode the sum of the products of three consecutive values, 1*2*3 + 4*5*6 + 7*8*9 + 10:
takeWhile (not . null) . map (take 3) . iterate (drop 3)
Now how do we produce online statistics while this algorithm calculates its result? This is the easiest part, but there is a catch. You have a corecursively produced list, which ensures that the list is lazy enough. All you have to do now is to consume the list as part of an IO action. The foldM combinator is most helpful here:
sumStats :: (Num a) => [a] -> IO a sumStats = foldM f 0 where f s x = do putStrLn ("Sum so far: " ++ s) return (s + x)
As said there is a catch. In your recursive consumer you actually have to make sure that the intermediate result is actually calculated. This is important, because otherwise your consumer doesn't actually force the calculation, but really just builds up a large unevaluated expression, which is only evaluated at the very end.
The easiest way to ensure this is to just do what I did in sumStats: Print the intermediate result (you may call it 'state') or some value derived from it (make sure that the value depends on the entire state). If you don't want to perform some IO action with the state you can also just be strict. There are many ways to be strict, my favorite being
f s x = do {- ... -} return $! s + x
but you can also use the BangPatterns extension.
The basic idea of all this is that you turn an opaque monolithic formula into a stream processing formula. This not only makes it more flexible, but may also help you understand the original problem better and split it into independent modules. Remember that you can always compose stream processors using regular function composition as done above.
Once you are comfortable with using lists for stream processing you can also use an actual stream processing abstraction. My personal favorite right now is the 'pipes' library, but there are other useful libraries including the other modern library 'conduit' as well as the traditional 'enumerator' library.
This of course does not end with streams. Streams are for algorithms with a linear execution paths, but not every algorithm follows that. If your formula naturally follows multiple paths, there is nothing wrong with corecursively producing and recursively consuming trees or graphs, for example. You just need unfoldTree+foldTreeM or unfoldGraph+foldGraphM for that. This works with every algebraic data structure.
As a final remark, don't worry about the performance. Haskell is a lazy language. Done properly at least the intermediate lists will be optimized away by the compiler and the resulting machine code is close to what a C compiler would have produced for the original monolithic formula randomly interspersed with statistics printing commands.
Greets Ertugrul
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
participants (3)
-
Bryce Verdier
-
Costello, Roger L.
-
Ertugrul Söylemez