
Steve Horne
The do-notation sugar may confuse the issue - the <- looks like an operator, but translating to binds-and-lambdas form suggests otherwise. Quick translations (I hope no mistakes) with lots of parens...
m >>= (\x -> (f x x))
m >>= (\x -> (m >>= (\y -> (f x y))))
First of all, you can omit all parentheses, because the (>>=) operator is associative by definition (this is not checked by the compiler though).
At first sight, these two expressions can give different results for reasons other than evaluation order. In particular, there are two bind operators, not just one.
That is, x and y could get different values for reasons other than the two m references referring to different things. So... is that true?
Yes, that's correct. Remember that referential transparency does not say that you are not allowed to pass different values to the same function. It just forbids functions to return different results for different inputs, and this is true for both expressions you showed. Using a state monad to get more concrete, let's say that 'm' returns the current state and then increments it. If the start state is 3, then 'x' will become 3 on execution and 'y' will become 4. However, the result of the composition is /always/ the state computation that returns the result of 'f' applied to the current state and the incremented current state. Just like when you write "x = getLine", then you can safely replace any occurence of 'getLine' by 'x'.
Of course even the bind operator arguably isn't primitive. We could translate to get rid of those too, and see what lies underneath. This is where we start seeing functions of type...
World -> (x, World)
Problem - this level of abstraction is hypothetical. It is not part of the Haskell language. Haskell specifically defines the IO monad to be a black box.
And that's fine, because IO is an embedded DSL. A better view of IO is a GADT like: data IO :: * -> * where GetLine :: IO String PutStrLn :: String -> IO () ... This is still hypothetical, but it shows how even IO is easily referentially transparent (as long as you don't use unsafe* cheats). Furthermore you have to consider that /execution/ of IO actions is entirely outside the scope of the language itself. A compiler turns an expression of type "IO ()" into machine code that executes the encoded recipe. In any case, you have to differentiate between evaluation and IO execution.
If main returns a function of type "World -> (x, World)" wrapped in a monad context, then there is referential transparency as defined in computer science. But is that a fair claim?
Even though the World state monad is a bad intuition in my opinion, why not? Evaluation is referentially transparent, because you don't evaluate to results of IO actions. You evaluate to the IO actions themselves. This is not a hack, it is a great feature, because it's precisely what allows us to build real world actions using combinators.
In this model, Haskell is an interpreted language for compositing functions. We can call those functions programs. The executable is a translation of the function returned by main, but *not* a translation of the source code.
It is a translation of the source code, because evaluation happens at run-time, interleaved with the execution of 'main', as 'main' goes along forcing thunks (in a thunk-based implementation).
So what main returns - that hypothetical function World -> (x, World) - isn't just a product of the program, it's also a representation of the program.
And there comes the limitation of this intuition. It is so theoretical that you can't express a function of that type even as machine code. On the other hand, the GADT variant could indeed be turned into bytecode and then interpreted by a bytecode machine. But this is not how it works in, say, GHC. GHC compiles to native code using the thunk approach I mentioned earlier.
When I say that Haskell lacks referential transparency because the execution of primitive IO actions is tied to the evaluation of the bind operators that extract out their results, and different executions of the same action yield different results, I'm only appealing to the defined semantics of the Haskell language. I'm not appealing to a hypothetical model where the world is passed as a parameter.
This is where your mistake is again: execution vs. evaluation. The (>>=) operator does /not/ extract results. It doesn't have to. It just makes sure (abstractly!) that the second argument is applied to the result of the first argument at execution time. It builds a data dependency between monadic actions. (>>=) is composition, not execution. Execution is outside the scope of the monadic interface, and for the specific case of IO it's even outside the scope of the language. If you think about it, you will find that (>>=) can't even extract results. The only thing it can do with results is to make other things dependent on them. This is because the type of the result is fully polymorphic.
So - the issue seems to be whether the IO monad is a context holding world-manipulating functions, or whether it's a black box with semantics specified at the bind level. And if referential transparency is decided at this level, what practical relevance does it have?
It's a black box and that's fine. There is nothing impure about IO.
It's probably better to stick with "the functional core is referentially transparent - IO actions break that, so don't overuse the IO monad". You can argue "may or may not break that depending on your viewpoint", but if you can't objectively decide which viewpoint is correct, then you haven't proven referential transparency.
You can objectively decide. Referential transparency means that an expression cannot magically evaluate to something this time and to something other the next time, be it an Int value, a function or an IO action. 'getLine' will always be 'getLine', and 'putStrLn "abc"' will always be the action that prints "abc" with a line feed. If you /had/ a concrete representation of 'putStrLn "abc"' (like with the GADT), then you could safely replace 'putStrLn "abc"' in the code by this representation. This is what you can do with all monads and everything else in Haskell. Again, I'm assuming you don't cheat. If you do cheat, then you are turning execution into evaluation, thereby potentially breaking referential transparency. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/