
Hello! I've run into strange effect that I can not explain. I have simple expression that can be written by two equivalent ways. However one way give much performance gain over another. Here is an example: -- apply function many times (tail-recursive) many n f x = if n == 0 then x else many (n-1) f $! (f x) -- first adder function adder1 = let r = many 5000000 sin 1.0 in \x -> x + r -- second adder function adder2 = \x -> x + many 5000000 sin 1.0 main = do putStrLn $ show $ adder1 1 putStrLn $ show $ adder1 2 putStrLn $ show $ adder1 3 putStrLn $ show $ adder1 4 putStrLn $ show $ adder2 1 putStrLn $ show $ adder2 2 putStrLn $ show $ adder2 3 putStrLn $ show $ adder2 4 If you run program it think some seconds performing math, and them prints 4 results immediately. But with adder2 function, it perform calculation in every call, which can be seen visually. It seems that compiler is able to "cache" in some way long computation in first case, but not in second. I always thought that let a = b in x + a is just a syntactic sugar for x + b. Is it wrong? Of course, this is just a toy example. In reality I was writing run-time compilation. Suppose, you have an simple arithmetic expression with variables. data Expr = Var String | Add Expr Expr | Mul Expr Expr | ... type Env = Map.Map String Double compile :: Expr -> (Env -> Double) compile e = case e of Var n -> \env -> env Map.! n Add e1 e2 -> \env -> (compile e1) env + (compile e2) env Mul e1 e2 -> \env -> (compile e1) env * (compile e2) env .... The same thing occur here. Expressions (compile e1) and (compile e2) are re-evaluated every time when compiled expression is run with some environment. So, it is much more an interpretation than compilation. I want that compile function returns expression that are pre-evaluated as much as possible to maximally speed up subsequent evaluation in different environments. How can I force (compile e1) to be evaluated eagerly in case line for Add/Mul? P.S. I'm using GHC 6.6 on Debian Linux (x86 arch). With best regards, Alexander.

I've run into strange effect that I can not explain. I have simple expression that can be written by two equivalent ways. However one way give much performance gain over another. Here is an example:
-- apply function many times (tail-recursive) many n f x = if n == 0 then x else many (n-1) f $! (f x)
-- first adder function adder1 = let r = many 5000000 sin 1.0 in \x -> x + r
-- second adder function adder2 = \x -> x + many 5000000 sin 1.0
main = do putStrLn $ show $ adder1 1 putStrLn $ show $ adder1 2 putStrLn $ show $ adder1 3 putStrLn $ show $ adder1 4 putStrLn $ show $ adder2 1 putStrLn $ show $ adder2 2 putStrLn $ show $ adder2 3 putStrLn $ show $ adder2 4
If you run program it think some seconds performing math, and them prints 4 results immediately. But with adder2 function, it perform calculation in every call, which can be seen visually.
It seems that compiler is able to "cache" in some way long computation in first case, but not in second.
Haskell implementations are allowed, but not required, to make use of referential transparency to substitute expressions for their results (and vice versa). The way I understand your example, it is entirely implementation dependent whether the result of "many 5000000 sin 1.0" is reused from one application to the next. Certainly, with GHC it does depend on the amount of optimisation applied. With -O, GHC spits out all 8 results at once. I agree that Haskell, and laziness in particular, does require you to have some appreciation for the way the implementation performs. I can only guess that in the unoptimised case, it's easy for GHC to see the common subexpression in adder1, because you've given it a name, but it's not looking hard enough to find it in adder2. With optimisation, it's smart enough to resuse the result across both functions. So my advice here would be: always try the optimiser before you worry too much about strange performance!
I always thought that let a = b in x + a is just a syntactic sugar for x + b. Is it wrong?
In general, let expressions are more than simple syntactic sugar. Let expressions introduce lexical scope, allow mutually recursive bindings, and use irrefutable (lazy) pattern matching. See the report, section 3.12 for the details. The part that is probably most surprising to the unaware is that let bindings always use lazy pattern matching. The Gentle Introduction, section 4.4, explains lazy pattern matching better than I can: http://www.haskell.org/tutorial/

On Sat, Jan 13, 2007 at 11:44:38AM +1000, Matthew Brecknell wrote:
So my advice here would be: always try the optimiser before you worry too much about strange performance! Thanks for help!
I've done some more experiments. The following program defines simple arithmetic expression with indexed variables. I've written four different ways to evaluate them: - eval1 is simple monadic evaluator - eval2 is the obvious straight-forward implentation - compile1 is attempt to perform compilation - compile2 is refined compile1 with sub-expressions explicitely separated via "let" binding. Test evaluates the same expression in 1000000 different environments. The results are: - eval1 - 17.47 sec - eval2 - 3.71 sec - compile1 - 3.79 sec - compile2 - 3.74 sec This figures are completely mysterious for me. 1) I expected eval1 and eval2 to perform equally. In fact, eval1 is 4.7 times slower for eval2. 2) I expected compile{1,2} to perform much faster then eval{1,2}. However, the "compilation" attempt does not give any speed up at all. Can anyone explain these results? Program was compiled with "ghc -O3 -fexcess-precision" command. For comparison, I wrote simple straight-forward implementation of compile2 in C++. It runs approximately 1.04 sec (3.57 better than Haskell). import Data.Array.Unboxed import Data.Array.Base import Control.Monad.Reader import Data.Int foldl' f z [] = z foldl' f z (x:xs) = (foldl' f $! f z x) xs type Value = Int32 data Expr = Const !Value | Var !Int | Add !Expr !Expr | Sub !Expr !Expr | Mul !Expr !Expr type Env = UArray Int Value eval1 :: Expr -> Reader Env Value eval1 e = case e of Const c -> return c Var n -> do { env <- ask; return $ env ! n } Add e1 e2 -> do v1 <- eval1 e1 v2 <- eval1 e2 return $ v1 + v2 Sub e1 e2 -> do v1 <- eval1 e1 v2 <- eval1 e2 return $ v1 - v2 Mul e1 e2 -> do v1 <- eval1 e1 v2 <- eval1 e2 return $ v1 * v2 eval2 :: Expr -> Env -> Value eval2 e env = case e of Const c -> c Var n -> env ! n Add e1 e2 -> eval2 e1 env + eval2 e2 env Sub e1 e2 -> eval2 e1 env - eval2 e2 env Mul e1 e2 -> eval2 e1 env * eval2 e2 env compile1 :: Expr -> (Env -> Value) compile1 e = case e of Const c -> \env -> c Var n -> \env -> env ! n Add e1 e2 -> \env -> (compile1 e1) env + (compile1 e2) env Sub e1 e2 -> \env -> (compile1 e1) env - (compile1 e2) env Mul e1 e2 -> \env -> (compile1 e1) env * (compile1 e2) env compile2 :: Expr -> (Env -> Value) compile2 e = case e of Const c -> \env -> c Var n -> \env -> env ! n Add e1 e2 -> let c1 = compile2 e1 in let c2 = compile2 e2 in \env -> c1 env + c2 env Sub e1 e2 -> let c1 = compile2 e1 in let c2 = compile2 e2 in \env -> c1 env - c2 env Mul e1 e2 -> let c1 = compile2 e1 in let c2 = compile2 e2 in \env -> c1 env * c2 env test1 :: Expr test1 = Add (Mul (Add (Var 0) (Var 1)) (Add (Var 0) (Var 1))) (Mul (Sub (Var 0) (Var 1)) (Sub (Var 0) (Var 1))) test :: Expr test = Add (Mul test1 test1) (Add test1 test1) do_test :: (Env -> Value) -> Value do_test e = foldl' (+) 0 (map (\n -> e (array (0, 1) [(0, n), (1, n)])) [0..1000000]) main = do putStrLn "testing empty expr" let res0 = do_test (\env -> 0) putStrLn $ "result is " ++ show res0 putStrLn "testing compile 1" let res1 = do_test (compile1 test) putStrLn $ "result is " ++ show res1 putStrLn "testing compile 2" let res2 = do_test (compile2 test) putStrLn $ "result is " ++ show res2 putStrLn "testing eval 1" let res3 = do_test (runReader (eval1 test)) putStrLn $ "result is " ++ show res3 putStrLn "testing eval 2" let res4 = do_test (eval2 test) putStrLn $ "result is " ++ show res4 return ()

I've done some more experiments. The following program defines simple arithmetic expression with indexed variables. I've written four different ways to evaluate them: - eval1 is simple monadic evaluator - eval2 is the obvious straight-forward implentation - compile1 is attempt to perform compilation - compile2 is refined compile1 with sub-expressions explicitely separated via "let" binding.
Test evaluates the same expression in 1000000 different environments. The results are: - eval1 - 17.47 sec - eval2 - 3.71 sec - compile1 - 3.79 sec - compile2 - 3.74 sec
This figures are completely mysterious for me. 1) I expected eval1 and eval2 to perform equally. In fact, eval1 is 4.7 times slower for eval2. 2) I expected compile{1,2} to perform much faster then eval{1,2}. However, the "compilation" attempt does not give any speed up at all.
Your intention is that (compile2 test) should analyze the expression tree of (test) only once when evaluating it for different environments. I'm not sure whether the constructors (Add), (Mul) etc. get replaced once and for all by (+) and (*) or whether this really matters, because (eval2), (compile1) and (compile2) have the same running time. I think that memoization (as explained in my previous post) only takes place for values not of function type, i.e. partially evaluated functions aren't memoized. It may also be that the compiler optimizes things for the concrete expression (test) you gave in your code. So supplying the expression interactively could show a difference between (eval2), (compile1) and (compile2). Ironically, (eval1) does "compile" as much as you intend (compile2) to do. But it looks like the overhead imposed by appealing to Control.Monad.Reader doesn't get inlined away completely. Currently, you don't do much work per expression, it just gets evaluated. To take advantage of memoization, you need to do more "expensive" analysis on a per expression basis. For example, you might want to precalculate stuff that doesn't depend on the environment: data ConstVar a = Const a | Var (Env -> a) eval :: ConstVar a -> Env -> a eval (Const x) = const x eval (Var f) = f -- addition, multiplication etc. do precalculation -- when the constituents are known beforehand instance Num a => ConstVar a where (Const x) + (Const y) = Const (x + y) x + y = Var (\e -> eval x e + eval y e) ... data Expr a = Value a | Variable Name | Add (Expr a) (Expr a) | Mul (Expr a) (Expr a) compile :: Num a => Expr a -> ConstVar a compile (Value c) = Const c compile (Variable v) = Var (\e -> e ! v) compile (Add x y) = (compile x) + (compile y) compile (Mul x y) = (compile x) * (compile y) testexpr = (Mul (Value 1) (Value 2)) `Add` (Variable 1) test = eval . compile $ testexpr Of course, this can be improved. For instance, it currently does not know about the associative law like in (Add (Value 1) (Add (Value 2) (Variable 1))) Now, it is clear that analyzing the expression again and again every time it needs to be evaluated ("interpretation") is wasted work. Regards, apfelmus PS:
data Expr = Const !Value | Var !Int | Add !Expr !Expr | Sub !Expr !Expr | Mul !Expr !Expr
You'd better leave out the strictness annotations (!).

I've run into strange effect that I can not explain. I have simple expression that can be written by two equivalent ways. However one way give much performance gain over another. Here is an example:
-- apply function many times (tail-recursive) many n f x = if n == 0 then x else many (n-1) f $! (f x)
-- first adder function adder1 = let r = many 5000000 sin 1.0 in \x -> x + r
-- second adder function adder2 = \x -> x + many 5000000 sin 1.0
If you run program it think some seconds performing math, and them prints 4 results immediately. But with adder2 function, it perform calculation in every call, which can be seen visually.
It seems that compiler is able to "cache" in some way long computation in first case, but not in second.
This is indeed the case and entirely reasonable. The effect is called "memoization", a key ingredient to lazy evaluation. To simplify the explanation, let's take the examples adder1 = let r = many 50000000 (1+) 0 in \x -> x + r adder2 = \x -> let s = many 50000000 (1+) 0 in x + s The evaluation of (adder1 3) proceeds as follows: adder1 3 = (let r = many 50000000 (1+) 0 in \x -> x + r) 3 = let r = many 50000000 (1+) 0 in 3 + r = let r = 50000000 in 3 + r = 50000003 Now, (r) will be updated with the result (50000000) after it has been calculated and subsequent access to (r) will retrieve the updated value as in adder1 4 = (let r = 50000000 in \x -> x + r) 4 = let r = 50000000 in 4 + r = 50000004 Every incarnation of (adder1) shares the same r. For (adder2), things are different. Here, (s) will be updated as well, but different incarnations of (adder2) do not share the same (s) because (s) is only in scope after supplying some argument (x). Hence, every (adder2 3) and (adder3 4) (re)calculates its own (s).
I always thought that let a = b in x + a is just a syntactic sugar for x + b. Is it wrong?
This is correct but doesn't apply to the case above. The question here is whether let a = b in \x -> x + a and \x -> let a = b in x + a are equivalent. Considering the result, they are. But considering running time and memory footprint, they are not. The first trades memory for speed, the second trades speed for memory. In general, the compiler is reluctant to switch between those two versions, i.e. it does not perform much common subexpression elimination or "let floating" (see GHC manual). The choice must be up to the programmer. Regards, apfelmus
participants (3)
-
Alexander Vodomerov
-
apfelmus@quantentunnel.de
-
Matthew Brecknell