Re to `optimization for list'

Joshua Haberman
Hello, I'm quite new to Haskell, but experienced in other languages (C, Python, Ruby, SQL, etc). I am interested in Haskell because I've heard that the language is capable of lots of optimizations based on laziness, and I want to learn more about that.
I dug in with Project Euler problem #1, and wrote:
main = print (show (sum [x | x <- [3..999], x `mod` 3 == 0 || x `mod` 5 == 0]))
So far so good, but I want to have some way of observing what optimizations GHC has performed. Most notably, in this example I want to know if the list was ever actually constructed in memory. The "sum" function only needs the elements one at a time, in order, so if Haskell and GHC are everything I've heard about them, I would fully expect the list construction to be optimized out. :) [..]
Your example is rather complex to start with, because `sum' has rather a complex definition in the Haskell library. Consider first a more pure example: main = putStr (shows (or' ys) "\n") where n = 10^8 ys = replicate n False or' :: [Bool] -> Bool or' [] = False or' (x: xs) = x || (or' xs) (the functions `||' (disjunction) and `replicate' are of the library). Compile it with ghc --make -O (for any occasion), and then, it will be evident that it takes a constant in n space (and stack) when running (run it like this: `./T +RTS -M1m -RTS' ). I think, this is not due to optimization but due to the very definition of evaluation in Haskell -- Head Normal Form. Compute this by hand: or' (replicate 9 False) = or' (False: (replicate 8 False)) = False || (or' (replicate 8 False)) = or' (replicate 8 False) = -- by definition of || ... or' [] = False. This is just the definition of evaluation in the Haskell language. In this example, the list under or' never expands further than for 2 members. Consider a more complex example (and more close to your example): main = putStr (shows (sum2 ys) "\n") -- sum' where n = 10^7 ys = [1 .. n] sum2 :: [Int] -> Int sum2 xs = sm xs 0 where sm [] s = s sm (x: xs) s = sm xs (s+x) sum' :: [Int] -> Int sum' [] = 0 sum' (x: xs) = x + (sum' xs) (the function [n .. m] is of the library). Compile it with ghc --make -O, and then, it will be evident that sum2 takes a constant in n space (and stack) when running (run it like this: `./T +RTS -M1m -RTS' ). Let us investigate this. Furst, compute this by hand with treating (+) as not of the library: sm [1 .. 4] 0 = sm (1:[2 .. 4]) 0 = sm [2 .. 4] (1+0) = sm (2: [3 .. 4]) (1+0) = sm [3 .. 4] (2+(1+0)) = ... It must accumulate an intermediate stack of size O(n). Why ghc-6.10.4 computes it in a constant space (and stack) ? The following is my guess. And let the Haskell experts correct me, please. In the Haskell-98 library it is defined that (+) is strict on the type of Int (right?). Therefore, the compiler optimizes (1+0) by replacing it immediately with 1. Taking in account that all other parts are computed lazily, the evaluation of the compiled program is like this: sm [1 .. 4] 0 = sm (1:[2 .. 4]) 0 = sm [2 .. 4] (1+0) = sm [2 .. 4] 1 = sm (2: [3 .. 4]) 1 = sm [3 .. 4] (2+1) = sm [3 .. 4] 3 = ... So, it is the result of combining of the evaluation definition with a certain compiler's optimization related to the strictness info for Int.(+). Right? As to sum', it still takes O(n) space -- in ghc-6.10.4. I think, this is because it is difficult for a compiler to guess how strictness of Int.(+) may correlate to the definition of sum'. Probably, it needs some kind of inductive reasoning about sum' to optimize it (for example) into sum2. Right ? Regards, ----------------- Serge Mechveliani mechvel@botik.ru
participants (1)
-
Serge D. Mechveliani