
On 10 Feb 2009, at 1:35 am, Gregg Reynolds wrote:
My original question was motivated by the observation that a human reader of an expression of the form "e >>= f" , on seeing that f is constant, may pull the constant value out of f, disregard e and dispense with the application f e.
There isn't any "application f e". Any human reader who does that is simply WRONG to do so.
So can a compiler, unless IO expressions are involved, in which case such optimizations are forbidden.
Remember that (e >>= f) means (>>=) e f. When/whether it is sound to replace (>>=) e f by (f undefined) depends on the semantics of >>=. But >>= is an operation of a type class (the Monad type class). If we come across g :: Monad m => m a -> b -> m b g e r = e >>= \x -> return r then the compiler doesn't know what m is, so it doesn't know which, if any, of its arguments >>= depends on. data Trivial x = Trivial x instance Monad Trivial where return = Trivial (>>=) (Trivial a) f = f a If we now try h :: Trivial a -> b -> Trivial b h e r = g e r then the compiler can inline the call to g h e r = e >>= \x -> return r and inline the calls to >>= and return h (Trivial a) r = (\x -> Trivial r) a then simplify to h (Trivial _) r = Trivial r You might have thought that the result doesn't depend on the first argument to h, but as written, it does. This version of h is strict in its first argument, so even though there are NO side effects whatever, the first argument will be evaluated to weak head normal form. Of course, since there is only one constructor for Trivial, it could be a newtype. If I change that declaration to newtype Trivial x = Trivial x then the pattern match is implicitly irrefutable, h ~(Trivial _) r = Trivial r and if the compiler doesn't simplify ~(NTC _) to _ when NTC is a newtype constructor, I'll be surprised, so it's like h _ r = Trivial r and the function will *NOT* be strict in its first argument. Here you see that the soundness or otherwise of eliminating the first argument of >>= when the second argument doesn't depend on the eventual result has nothing to do with IO as such and certainly nothing to do with side effects. It's really all about whether the version of >>= used is strict in its first argument or not.
I wondered if that was due to the semantics of >>= or the semantics of IO.
It's about the semantics of IO's implementation of >>=.
To summarize what I've concluded thanks to the helpful people on haskell-cafe:
The compiler can optimize e >>= f except for any IO expressions in e and f.
False. There is nothing unusual about IO here.
IO expressions must be evaluated, due to the semantics of IO.
False.
The may not be disregarded, memoized, or substituted.
False. In Haskell there is nothing whatever unusual about expressions of type IO something. They *MAY* be disregarded: let n = length [getChar, putChar 'f'] can be optimised to let n = 2. They MAY be memoised. They MAY be substituted.
IO semantics may be implemented in different ways by different compilers;
True. But then, so may Int semantics.
these implementation techniques are not part of the formal semantics of the language, which may be expressed as above: IO expressions must be evaluated wherever and whenever they occur.
False. Utterly false.
The bind operator >>= enforces sequencing on arguments containing IO expressions, but does not force evaluation.
False. For the special case of IO (and for other similar monads)
= enforced sequence BY forcing evaluation (more precisely, by being strict in its first argument).
Even bind expressions involving IO may be optimized. For example:
getChar >>= \x -> ...<monster computation>... putChar 'c'
The compiler may discard <monster computation> (assuming it contains no IO expressions), but it must evaluate getChar and putChar (due to IO semantics) in the correct order (due to bind semantics).
You are conflating *evaluating* (by the Haskell evaluator) with *performance* (by the Haskell environment). IO *values* are determined by the Haskell evaluator exactly like any other values; IO *actions* are performed by the Haskell environment as part of the process of forcing the lazy evaluator to do some work. In your fragmentary example, <monster computation> may be discarded EVEN IF it contains IO expressions, it's only if they are linked into the IO chain using >> and/or >>= that the environment will perform their values.
Thanks all,
gregg