
The haskell program below demonstrates an optimization bug with the GHC compiler. I have tried this with ghc 6.8.1 on my MacBook Pro and on a Linux i686 machine. This email should build as a literate haskell program. Build the program as ghc --make -O lazyevalbug.lhs and run it with no arguments. An IO action runs 4 times, printing out how long it took each time. It takes 1 to 2 seconds to run each time, but it should only take 1 second or 2 the first time, and finish almost instantaneously the rest of the times. If you run the program with "fast" as a cmdline arg, as in ./lazyevalbug fast then it will behave how I'd expect it to: the first IO action takes a couple of seconds tand the rest finish instantaneously. If you build the program without the -O switch, then it behaves the same with our without the "fast" commandline arg. DETAILS: The bug involves lazy evalutation behaving in an unexpected way when the program is compiled with the -O switch. If I create an IO action which refers to a variable which was defined outside of the IO action, and then run that IO action multiple times, then in some circumstances the variable value will be re-computed each time the IO action is run. This seems to violate the behavior I expect from lazy evaluation. I.e. if I have code like this: let val = really_expensive_computation in map (\x -> val + x) [1, 2, 3] I would expect val to only be computed once. And it is. However if instead of referring to val in a lambda function I refer to val from an IO action, then in some cases it seems to be re-evaluated every time the action is performed. I.e. let val = really_expensive_computation in replicateM_ 3 (print val) When compiled with -O, the above will recompute val 3 times. However adding a second reference to "val" seems to make the problem go away. Another thing I noticed is, the problem seems to only occur if really_expensive_computation involves the IO monad somehow, for example if it's doing a computation based on some input read from stdin. If really_expensive_computation only involves data generated within the program then the bug never occurs. The program below demonstrates this problem with the doSlow and doFast functions. See the comment above these functions. Also, in main, if you comment out the getStdGen line and uncomment the mkStdGen line, then the problem goes away. This demonstrates the fact that if the computation uses data from an IO action the problem occurs, otherwise it doesn't.
import System (getArgs) import qualified System.Time as ST import Control.Monad (when, replicateM_) import System.Random (getStdGen, mkStdGen, randoms)
Takes an IO action, runs it, and prints line saying how long it took to run
timeIt :: IO a -> IO () timeIt action = do tic <- ST.getClockTime action toc <- ST.getClockTime putStrLn $ "Took " ++ (timeDiffToString (ST.diffClockTimes toc tic)) -- System.Time.timeDiffToString doesn't handle fractions of a -- second correctly, so I implement my own version where timeDiffToString d = let secs = ST.tdMin d * 60 + ST.tdSec d frac = (realToFrac $ abs $ ST.tdPicosec d) / (realToFrac $ 10^12) in show ((realToFrac secs) + frac) ++ " secs"
True if a number is prime. Just a function meant to use up some time
isPrime :: (Integral n) => n -> Bool isPrime n = let lim = round (sqrt $ realToFrac n) in not $ any (\x -> n `rem` x == 0) [2..lim]
doSlow and doFast are exactly the same, except that doFast has an extra "when" statement which references "primes". The mere existence of this extra reference to "primes" in doFast seems to cause the primes list to only be generated once, so the first time action runs takes around 1.6 seconds, but subsequent actions finish in a small fraction of a second. In doSlow, the action takes 1.6 seconds each time it runs, not just the first time, indicating that primes is getting re-created each time the action is run.
doSlow, doFast :: (Integral a) => [a] -> IO ()
doSlow lst = let primes = filter isPrime lst action = timeIt $ print (length primes) in do replicateM_ 4 action
doFast lst = let primes = filter isPrime lst action = timeIt $ print (length primes) in do replicateM_ 4 action when (null primes) (print "No primes")
In main, comment out the getStdGen line and uncomment the mkStdGen line and the problem goes away as well. The same is true if I read data from stdin vs generating a list in memory.
main = do args <- getArgs let doFunc = if (length args) > 0 && args!!0 == "fast" then doFast else doSlow --gen <- return $ mkStdGen 42 gen <- getStdGen let lst = take 10000 $ randoms gen :: [Integer] doFunc lst