
The haskell program below demonstrates an optimization bug with the GHC compiler. I have tried this with ghc 6.8.1 on my MacBook Pro and on a Linux i686 machine. This email should build as a literate haskell program. Build the program as ghc --make -O lazyevalbug.lhs and run it with no arguments. An IO action runs 4 times, printing out how long it took each time. It takes 1 to 2 seconds to run each time, but it should only take 1 second or 2 the first time, and finish almost instantaneously the rest of the times. If you run the program with "fast" as a cmdline arg, as in ./lazyevalbug fast then it will behave how I'd expect it to: the first IO action takes a couple of seconds tand the rest finish instantaneously. If you build the program without the -O switch, then it behaves the same with our without the "fast" commandline arg. DETAILS: The bug involves lazy evalutation behaving in an unexpected way when the program is compiled with the -O switch. If I create an IO action which refers to a variable which was defined outside of the IO action, and then run that IO action multiple times, then in some circumstances the variable value will be re-computed each time the IO action is run. This seems to violate the behavior I expect from lazy evaluation. I.e. if I have code like this: let val = really_expensive_computation in map (\x -> val + x) [1, 2, 3] I would expect val to only be computed once. And it is. However if instead of referring to val in a lambda function I refer to val from an IO action, then in some cases it seems to be re-evaluated every time the action is performed. I.e. let val = really_expensive_computation in replicateM_ 3 (print val) When compiled with -O, the above will recompute val 3 times. However adding a second reference to "val" seems to make the problem go away. Another thing I noticed is, the problem seems to only occur if really_expensive_computation involves the IO monad somehow, for example if it's doing a computation based on some input read from stdin. If really_expensive_computation only involves data generated within the program then the bug never occurs. The program below demonstrates this problem with the doSlow and doFast functions. See the comment above these functions. Also, in main, if you comment out the getStdGen line and uncomment the mkStdGen line, then the problem goes away. This demonstrates the fact that if the computation uses data from an IO action the problem occurs, otherwise it doesn't.
import System (getArgs) import qualified System.Time as ST import Control.Monad (when, replicateM_) import System.Random (getStdGen, mkStdGen, randoms)
Takes an IO action, runs it, and prints line saying how long it took to run
timeIt :: IO a -> IO () timeIt action = do tic <- ST.getClockTime action toc <- ST.getClockTime putStrLn $ "Took " ++ (timeDiffToString (ST.diffClockTimes toc tic)) -- System.Time.timeDiffToString doesn't handle fractions of a -- second correctly, so I implement my own version where timeDiffToString d = let secs = ST.tdMin d * 60 + ST.tdSec d frac = (realToFrac $ abs $ ST.tdPicosec d) / (realToFrac $ 10^12) in show ((realToFrac secs) + frac) ++ " secs"
True if a number is prime. Just a function meant to use up some time
isPrime :: (Integral n) => n -> Bool isPrime n = let lim = round (sqrt $ realToFrac n) in not $ any (\x -> n `rem` x == 0) [2..lim]
doSlow and doFast are exactly the same, except that doFast has an extra "when" statement which references "primes". The mere existence of this extra reference to "primes" in doFast seems to cause the primes list to only be generated once, so the first time action runs takes around 1.6 seconds, but subsequent actions finish in a small fraction of a second. In doSlow, the action takes 1.6 seconds each time it runs, not just the first time, indicating that primes is getting re-created each time the action is run.
doSlow, doFast :: (Integral a) => [a] -> IO ()
doSlow lst = let primes = filter isPrime lst action = timeIt $ print (length primes) in do replicateM_ 4 action
doFast lst = let primes = filter isPrime lst action = timeIt $ print (length primes) in do replicateM_ 4 action when (null primes) (print "No primes")
In main, comment out the getStdGen line and uncomment the mkStdGen line and the problem goes away as well. The same is true if I read data from stdin vs generating a list in memory.
main = do args <- getArgs let doFunc = if (length args) > 0 && args!!0 == "fast" then doFast else doSlow --gen <- return $ mkStdGen 42 gen <- getStdGen let lst = take 10000 $ randoms gen :: [Integer] doFunc lst

I can confirm this performance bug with GHC version 6.6.1 on OS X PCC (G4) Leopard. It performs correctly, lazily keeping 'primes' for all 4 repetitions, when using "ghc -Onot" but the the doSlow fails to retain 'primes' when compiling with "ghc -O" and "ghc -O2" Cheers, Chris

haskell:
I can confirm this performance bug with GHC version 6.6.1 on OS X PCC (G4) Leopard.
It performs correctly, lazily keeping 'primes' for all 4 repetitions, when using "ghc -Onot" but the the doSlow fails to retain 'primes' when compiling with "ghc -O" and "ghc -O2"
Sounds like the known issue of the simplifier eta-expanding `State#` lambdas on purpose, which is good for IO, but risks re-computation. The 'calendar' nofib program exhibits this, iirc. See, http://hackage.haskell.org/trac/ghc/ticket/1168 http://hackage.haskell.org/trac/ghc/ticket/1957 Try compiling with: -fno-state-hack -- Don

Thanks Don. That is the issue. I just added this comment/request to trac #1168:
This just came up on the mailing list again. The program was slow unless (-fno-state-hack) was used.
Is it possible to have GHC emit a warning (as part of -Wall) to let the use know that GHC is applying a potentially risky simplification?
I had tried -Wall before you told me about -fno-state-hack. If -Wall had warned me about the hack then I could have understood the problem without a frequently asked question to the list. Cheers, Chris Don Stewart wrote:
haskell:
I can confirm this performance bug with GHC version 6.6.1 on OS X PCC (G4) Leopard.
It performs correctly, lazily keeping 'primes' for all 4 repetitions, when using "ghc -Onot" but the the doSlow fails to retain 'primes' when compiling with "ghc -O" and "ghc -O2"
Sounds like the known issue of the simplifier eta-expanding `State#` lambdas on purpose, which is good for IO, but risks re-computation. The 'calendar' nofib program exhibits this, iirc.
See, http://hackage.haskell.org/trac/ghc/ticket/1168 http://hackage.haskell.org/trac/ghc/ticket/1957
Try compiling with:
-fno-state-hack
-- Don

On Sun, Feb 03, 2008 at 02:37:42PM -0500, Adam P Jenkins wrote:
The haskell program below demonstrates an optimization bug with the GHC compiler. I have tried this with ghc 6.8.1 on my MacBook Pro and on a Linux i686 machine. This email should build as a literate haskell program. Build the program as
While this may be unfortunate, it is a consequence of document behavior; thus I would not consider it a bug per-se. Try with -fno-state-hack; the documentation for this option, I believe, should explain what is going on. (Hint: IO a ~ State# -> (State#, a)) Stefan

Thank you for the explanation. Inlining does explain the behavior I was seeing, and -fno-state-hack does make the program behave the way I'd expect. I would like to humbly submit that perhaps this optimization should be off by default. It seems to me that any compiler optimization which can potentially increase your program's runtime by an order of complexity is dangerous. Also, while the Haskell Report doesn't seem to specify anything about lazy evaluation behavior, it seems to me that in order to program in Haskell, assuming that a computation assigned to a variable won't be recomputed multiple times is almost as necessary an assumption as assuming that the compiler will optimize away tail recursion. For instance GHC FAQ entry 7.2 implicitly assumes this is true: http://haskell.org/haskellwiki/GHC:FAQ#Does_GHC_do_common_subexpression_elim... In my real program where I ran into this problem it took me quite a while to figure out why my program's performance wasn't making any sense, and why seemingly irrelevant changes to my code made the performance problem go away. Thank you, Adam On Feb 3, 2008, at 3:18 PM, Stefan O'Rear wrote:
On Sun, Feb 03, 2008 at 02:37:42PM -0500, Adam P Jenkins wrote:
The haskell program below demonstrates an optimization bug with the GHC compiler. I have tried this with ghc 6.8.1 on my MacBook Pro and on a Linux i686 machine. This email should build as a literate haskell program. Build the program as
While this may be unfortunate, it is a consequence of document behavior; thus I would not consider it a bug per-se. Try with -fno-state-hack; the documentation for this option, I believe, should explain what is going on. (Hint: IO a ~ State# -> (State#, a))
Stefan

| Thank you for the explanation. Inlining does explain the behavior I | was seeing, and -fno-state-hack does make the program behave the way | I'd expect. | | I would like to humbly submit that perhaps this optimization should be | off by default. I agree that the current behaviour sometimes bites badly, as it did for you here. Trouble is, removing the hack has (or used to have) a significant negative effect on performance of lots of programs. Still, I think that there are various other possibilities to try. For example, compiling the I/O libraries (only) with the state-hack on, and with it off by default elsewhere, might gather most of the benefits without the costs. Or restricting it to top level functions. Or something like that. It needs someone who's willing to do some measurements and try variations. We can make suggestions about things to try. Meanwhile, I'll tack these remarks onto http://hackage.haskell.org/trac/ghc/ticket/1168 Thanks for bringing this up. Simon
participants (5)
-
Adam P Jenkins
-
Chris Kuklewicz
-
Don Stewart
-
Simon Peyton-Jones
-
Stefan O'Rear