
On Tue, May 06, 2003 at 07:22:20AM -0400, David Roundy wrote:
On Tue, May 06, 2003 at 05:02:56AM -0500, Galen Menzel wrote:
makeList :: IO [Int] makeList = do s <- getLine case read s of 0 -> return [] x -> do xs <- makeList return (x:xs)
I changed the if to a case so that (read s) doesn't get called twice.
Does the compiler not optimize away identical calls? Or is it not allowed to, for some reason?
This is certainly possible, particularly with a referentially transparent language. There's a technique called memoization that basically keeps a table of values that a function has already computed. When a memoized function is called on a value, a lookup is performed to see if the function has already been called on this value before. If so, the value recorded in the table is returned. If not, the function is called and its returned value is stored in the table. This is not the default way of handling functions in Haskell, I guess to keep function-call overhead low. However, GHC and hugs both provide memo functions that do this for you. Unfortunately, they're not always useful. From the GHC memo documentation: memo :: (a -> b) -> a -> b So, for example, memo f is a version of f that caches the results of previous calls. The searching is very fast, being based on pointer equality. One consequence of this is that the caching will only be effective if exactly the same argument is passed again to the memoised function. This means not just a copy of a previous argument, but the same instance. It's not useful to memoise integer functions using this interface, because integers are generally copied a lot and two instances of '27' are unlikely to refer to the same object. Thus they don't work for the prototypical memoized function, fib: memo-fib = memo fib fib 0 = 0 fib 1 = 1 fib x = memo-fib (x-1) + memo-fib (x-2) Comparing pointers is a reasonable thing to do, since it can be inefficient and impractical to check parameter equality for functions of more complex data structures. But not being able to memoize functions of integers seems kind of silly.
I agree that the case looks nicer, and would do it in general, but have often wondered how hard the compilers try (or are allowed to try... as it might mess up space usage) to optimize away multiple identical calls.
Basically, if I write:
thrice_length s = length s + length s + length s
(A very stupid example...)
Is my program really going to run length s three times, or will it just call it once (assuming I have full optimizations on)? I would have thought that one of the major advantages of a purely functional language would be that the compiler could safely eliminate common subexpressions, since it knows that they will always be identical.
As far as I know, this actually calls length s three times (although this is a case where memo would work).
On the other hand, if I write a function like:
long_list = [1..100000] ++ [1..100000] ++ [1..100000] ++ [1..100000]
And then run a
putStr $ unlines $ map show long_list
I would expect this to take very little memory, which would mean that the [1..100000] would need to be calculated four separate times rather than being calculated once and stored...
After I ran this my ghci process had an 82MB footprint. I don't know how many times [1..100000] gets calculated. My guess is four, but don't quote me on that. galen