GHC lazy eval optimization bug

3 Feb 2008

      The haskell program below demonstrates an optimization bug with the
GHC compiler.  I have tried this with ghc 6.8.1 on my MacBook Pro and
on a Linux i686 machine.  This email should build as a literate haskell
program.  Build the program as

ghc --make -O lazyevalbug.lhs

and run it with no arguments.  An IO action runs 4 times, printing out
how long it took each time.  It takes 1 to 2 seconds to run each time,
but it should only take 1 second or 2 the first time, and finish
almost instantaneously the rest of the times.  If you run the program
with "fast" as a cmdline arg, as in

   ./lazyevalbug fast

then it will behave how I'd expect it to: the first IO action takes a
couple of seconds tand the rest finish instantaneously.  If you build  
the
program without the -O switch, then it behaves the same with our
without the "fast" commandline arg.

DETAILS:

The bug involves lazy evalutation behaving in an unexpected way when
the program is compiled with the -O switch.  If I create an IO action
which refers to a variable which was defined outside of the IO action,
and then run that IO action multiple times, then in some circumstances
the variable value will be re-computed each time the IO action is run.
This seems to violate the behavior I expect from lazy evaluation.
I.e. if I have code like this:

let val = really_expensive_computation
in map (\x -> val + x) [1, 2, 3]

I would expect val to only be computed once.  And it is.  However if
instead of referring to val in a lambda function I refer to val from
an IO action, then in some cases it seems to be re-evaluated every
time the action is performed.  I.e.

let val = really_expensive_computation
in replicateM_ 3 (print val)

When compiled with -O, the above will recompute val 3 times.  However
adding a second reference to "val" seems to make the problem go away.

Another thing I noticed is, the problem seems to only occur if
really_expensive_computation involves the IO monad somehow, for
example if it's doing a computation based on some input read from
stdin.  If really_expensive_computation only involves data generated
within the program then the bug never occurs.

The program below demonstrates this problem with the doSlow and doFast
functions.  See the comment above these functions.  Also, in main, if
you comment out the getStdGen line and uncomment the mkStdGen line,
then the problem goes away.  This demonstrates the fact that if the
computation uses data from an IO action the problem occurs, otherwise
it doesn't.
...
import System (getArgs)
import qualified System.Time as ST
import Control.Monad (when, replicateM_)
import System.Random (getStdGen, mkStdGen, randoms)
Takes an IO action, runs it, and prints line saying how long it took
to run
...
timeIt :: IO a -> IO ()
timeIt action =
    do tic <- ST.getClockTime
       action
       toc <- ST.getClockTime
       putStrLn $ "Took " ++ (timeDiffToString (ST.diffClockTimes  
toc tic))
    -- System.Time.timeDiffToString doesn't handle fractions of a
    -- second correctly, so I implement my own version
    where timeDiffToString d =
              let secs = ST.tdMin d * 60 + ST.tdSec d
                  frac = (realToFrac $ abs $ ST.tdPicosec d) /  
(realToFrac $ 10^12)
              in show ((realToFrac secs) + frac) ++ " secs"
True if a number is prime.  Just a function meant to use up some
time
...
isPrime :: (Integral n) => n -> Bool
isPrime n =
    let lim = round (sqrt $ realToFrac n) in
    not $ any (\x -> n `rem` x == 0) [2..lim]
doSlow and doFast are exactly the same, except that doFast has an
extra "when" statement which references "primes".  The mere existence of
this extra reference to "primes" in doFast seems to cause the primes
list to only be generated once, so the first time action runs
takes around 1.6 seconds, but subsequent actions finish in a small
fraction of a second.  In doSlow, the action takes 1.6 seconds each
time it runs, not just the first time, indicating that primes is
getting re-created each time the action is run.
...
doSlow, doFast :: (Integral a) => [a] -> IO ()
...
doSlow lst = let primes = filter isPrime lst
                 action = timeIt $ print (length primes)
             in do replicateM_ 4 action
...
doFast lst = let primes = filter isPrime lst
                 action = timeIt $ print (length primes)
             in do replicateM_ 4 action
                   when (null primes) (print "No primes")
In main, comment out the getStdGen line and uncomment the mkStdGen
line and the problem goes away as well. The same is true if I read
data from stdin vs generating a list in memory.
...
main = do args <- getArgs
          let doFunc = if (length args) > 0 && args!!0 == "fast"
                       then doFast
                       else doSlow
          --gen <- return $ mkStdGen 42
          gen <- getStdGen
          let lst = take 10000 $ randoms gen :: [Integer]
          doFunc lst

GHC lazy eval optimization bug

Adam P Jenkins