Re: [Haskell] Lazy IO breaks purity

Lennart Augustsson wrote:
I don't see any breaking of referential transparence in your code. Every time you do an IO operation the result is basically non-deterministic since you are talking to the outside world. You're assuming the IO has some kind of semantics that Haskell makes no promises about.
I'm not saying that this isn't a problem, because it is. But it doesn't break referential transparency, it just makes the semantics of IO even more complicated.
(I don't have a formal proof that unsafeInterleaveIO cannot break RT, but I've not seen an example where it does yet.)
So the argument is something like: we can think of the result of a call to unsafeInterleaveIO as having been chosen at the time we called unsafeInterleaveIO, rather than when its result is actually evaluated. This is on dodgy ground, IMO: either you admit that the IO monad contains an Oracle, or you admit it can time-travel. I don't believe in either of those things :-) Cheers, Simon
-- Lennart
On Thu, Mar 5, 2009 at 2:12 AM,
wrote: We demonstrate how lazy IO breaks referential transparency. A pure function of the type Int->Int->Int gives different integers depending on the order of evaluation of its arguments. Our Haskell98 code uses nothing but the standard input. We conclude that extolling the purity of Haskell and advertising lazy IO is inconsistent.
Henning Thielemann wrote on Haskell-Cafe:
Say I have a chain of functions: read a file, ... write that to a file, all of these functions are written in a lazy way, which is currently considered good style Lazy IO should not be considered good style. One of the common definitions of purity is that pure expressions should evaluate to the same results regardless of evaluation order, or that equals can be substituted for equals. If an expression of the type Int evaluates to 1, we should be able to replace every occurrence of the expression with 1 without changing the results and other observables.
The expression (read s) where s is the result of the (hGetContents h1) action has the pure type, e.g., Int. Yet its evaluation does more than just producing an integer: it may also close a file associated with the handle h1. Closing the file has an observable effect: the file descriptor, which is a scarce resource, is freed. Some OS lock the open file, or prevent operations such as renaming and deletion on the open file. A file system whose file is open cannot be unmounted. Suppose I use an Haskell application such as an editor to process data from a removable drive. I mount the drive, tell the application the file name. The (still running) application finished with the file and displayed the result. And yet I can't unplug the removable drive, because it turns out that the final result was produced without needing to read all the data from the file, and the file remains unclosed.
Some people say: the programmer should have used seq. That is the admission of guilt! An expression (read s)::Int that evaluates to 1 is equal to 1. So, one should be able substitute (read s) with 1. If the result of evaluating 1 turns out not needed for the final outcome, then not evaluating (read s) should not affect the outcome. And yet it does. One uses seq to force evaluation of an expression even if the result may be unused. Thus the expression of a pure type has *side-effect*.
The advice about using seq reminds me of advice given to C programmers that they should not free a malloc'ed pointer twice, dereference a zero pointer and write past the boundary of an array. If lazy IO is considered good style given the proper use of seq, then C should be considered safe given the proper use of pointers and arrays.
Side affects like closing a file are observable in the external world. A program may observe these effects, in an IO monad. One can argue there are no guarantees at all about what happens in the IO monad. Can side-effects of lazy IO be observable in _pure_ Haskell code, in functions with pure type? Yes, they can. The examples are depressingly easy to write, once one realizes that reading does have side effects, POSIX provides for file descriptor aliasing, and sharing becomes observable with side effects. Here is a simple Haskell98 code.
{- Haskell98! -}
module Main where
import System.IO import System.Posix.IO (fdToHandle, stdInput)
-- f1 and f2 are both pure functions, with the pure type. -- Both compute the result of the subtraction e1 - e2. -- The only difference between them is the sequence of -- evaluating their arguments, e1 `seq` e2 vs. e2 `seq` e1 -- For really pure functions, that difference should not be observable
f1, f2:: Int -> Int -> Int
f1 e1 e2 = e1 `seq` e2 `seq` e1 - e2 f2 e1 e2 = e2 `seq` e1 `seq` e1 - e2
read_int s = read . head . words $ s
main = do let h1 = stdin h2 <- fdToHandle stdInput s1 <- hGetContents h1 s2 <- hGetContents h2 print $ f1 (read_int s1) (read_int s2) -- print $ f2 (read_int s1) (read_int s2)
One can compile it with GHC (I used 6.8.2, with and without -O2) and run like this ~> /tmp/a.out 1 2 -1 That is, we run the program and type 1 and 2 as the inputs. The program prints out the result, -1. If we comment out the line "print $ f1 (read_int s1) (read_int s2)" and uncomment out the last line the transcript looks like this ~> /tmp/a.out 1 2 1
Running the code with Hugs exhibits the same behavior. Thus a pure function (-) of the pure type gives different results depending on the order of evaluating its arguments.
Is 1 = -1?
_______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
_______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell

You're assuming that IO operations have semantics.
From the Haskell program's point of view, and when reasoning about Haskell programs (not their interaction with the world) you should assume that every IO operation returns a random result. The way Oleg's program behaves does not break RT under the random result assumption.
What could break RT is if you could do this
f :: Int -> Int <- someIOoperation
let x = f 0; y = f 0
and end up with x and y not being equal.
It's (of course) easy to construct someIOoperation that has this
behaviour (using FFI), but I don't think you can construct it just
using the normal IO operations and unsafeInterleaveIO.
But as I said, I think this is a problem anyway, because IO without
semantics is rather useless.
-- Lennart
On Thu, Mar 5, 2009 at 1:08 PM, Simon Marlow
Lennart Augustsson wrote:
I don't see any breaking of referential transparence in your code. Every time you do an IO operation the result is basically non-deterministic since you are talking to the outside world. You're assuming the IO has some kind of semantics that Haskell makes no promises about.
I'm not saying that this isn't a problem, because it is. But it doesn't break referential transparency, it just makes the semantics of IO even more complicated.
(I don't have a formal proof that unsafeInterleaveIO cannot break RT, but I've not seen an example where it does yet.)
So the argument is something like: we can think of the result of a call to unsafeInterleaveIO as having been chosen at the time we called unsafeInterleaveIO, rather than when its result is actually evaluated. This is on dodgy ground, IMO: either you admit that the IO monad contains an Oracle, or you admit it can time-travel. I don't believe in either of those things :-)
Cheers, Simon
-- Lennart
On Thu, Mar 5, 2009 at 2:12 AM,
wrote: We demonstrate how lazy IO breaks referential transparency. A pure function of the type Int->Int->Int gives different integers depending on the order of evaluation of its arguments. Our Haskell98 code uses nothing but the standard input. We conclude that extolling the purity of Haskell and advertising lazy IO is inconsistent.
Henning Thielemann wrote on Haskell-Cafe:
Say I have a chain of functions: read a file, ... write that to a file, all of these functions are written in a lazy way, which is currently considered good style
Lazy IO should not be considered good style. One of the common definitions of purity is that pure expressions should evaluate to the same results regardless of evaluation order, or that equals can be substituted for equals. If an expression of the type Int evaluates to 1, we should be able to replace every occurrence of the expression with 1 without changing the results and other observables.
The expression (read s) where s is the result of the (hGetContents h1) action has the pure type, e.g., Int. Yet its evaluation does more than just producing an integer: it may also close a file associated with the handle h1. Closing the file has an observable effect: the file descriptor, which is a scarce resource, is freed. Some OS lock the open file, or prevent operations such as renaming and deletion on the open file. A file system whose file is open cannot be unmounted. Suppose I use an Haskell application such as an editor to process data from a removable drive. I mount the drive, tell the application the file name. The (still running) application finished with the file and displayed the result. And yet I can't unplug the removable drive, because it turns out that the final result was produced without needing to read all the data from the file, and the file remains unclosed.
Some people say: the programmer should have used seq. That is the admission of guilt! An expression (read s)::Int that evaluates to 1 is equal to 1. So, one should be able substitute (read s) with 1. If the result of evaluating 1 turns out not needed for the final outcome, then not evaluating (read s) should not affect the outcome. And yet it does. One uses seq to force evaluation of an expression even if the result may be unused. Thus the expression of a pure type has *side-effect*.
The advice about using seq reminds me of advice given to C programmers that they should not free a malloc'ed pointer twice, dereference a zero pointer and write past the boundary of an array. If lazy IO is considered good style given the proper use of seq, then C should be considered safe given the proper use of pointers and arrays.
Side affects like closing a file are observable in the external world. A program may observe these effects, in an IO monad. One can argue there are no guarantees at all about what happens in the IO monad. Can side-effects of lazy IO be observable in _pure_ Haskell code, in functions with pure type? Yes, they can. The examples are depressingly easy to write, once one realizes that reading does have side effects, POSIX provides for file descriptor aliasing, and sharing becomes observable with side effects. Here is a simple Haskell98 code.
{- Haskell98! -}
module Main where
import System.IO import System.Posix.IO (fdToHandle, stdInput)
-- f1 and f2 are both pure functions, with the pure type. -- Both compute the result of the subtraction e1 - e2. -- The only difference between them is the sequence of -- evaluating their arguments, e1 `seq` e2 vs. e2 `seq` e1 -- For really pure functions, that difference should not be observable
f1, f2:: Int -> Int -> Int
f1 e1 e2 = e1 `seq` e2 `seq` e1 - e2 f2 e1 e2 = e2 `seq` e1 `seq` e1 - e2
read_int s = read . head . words $ s
main = do let h1 = stdin h2 <- fdToHandle stdInput s1 <- hGetContents h1 s2 <- hGetContents h2 print $ f1 (read_int s1) (read_int s2) -- print $ f2 (read_int s1) (read_int s2)
One can compile it with GHC (I used 6.8.2, with and without -O2) and run like this ~> /tmp/a.out 1 2 -1 That is, we run the program and type 1 and 2 as the inputs. The program prints out the result, -1. If we comment out the line "print $ f1 (read_int s1) (read_int s2)" and uncomment out the last line the transcript looks like this ~> /tmp/a.out 1 2 1
Running the code with Hugs exhibits the same behavior. Thus a pure function (-) of the pure type gives different results depending on the order of evaluating its arguments.
Is 1 = -1?
_______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
_______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, 2009-03-05 at 13:08 +0000, Simon Marlow wrote:
Lennart Augustsson wrote:
I don't see any breaking of referential transparence in your code. Every time you do an IO operation the result is basically non-deterministic since you are talking to the outside world. You're assuming the IO has some kind of semantics that Haskell makes no promises about.
I'm not saying that this isn't a problem, because it is. But it doesn't break referential transparency, it just makes the semantics of IO even more complicated.
(I don't have a formal proof that unsafeInterleaveIO cannot break RT, but I've not seen an example where it does yet.)
So the argument is something like: we can think of the result of a call to unsafeInterleaveIO as having been chosen at the time we called unsafeInterleaveIO, rather than when its result is actually evaluated. This is on dodgy ground, IMO: either you admit that the IO monad contains an Oracle, or you admit it can time-travel. I don't believe in either of those things :-)
That's the charm of denotational semantics --- you're outside the laws of physics. jcc

On Thu, Mar 5, 2009 at 7:08 AM, Simon Marlow
So the argument is something like: we can think of the result of a call to unsafeInterleaveIO as having been chosen at the time we called unsafeInterleaveIO, rather than when its result is actually evaluated. This is on dodgy ground, IMO: either you admit that the IO monad contains an Oracle, or you admit it can time-travel. I don't believe in either of those things :-)
Surely there's a quantum mechanical metaphor waiting to happen here. getCat :: IO Cat If "getCat" appears in a program text, does it denote or not? Or both? If it does, is the cat alive or dead? (Apologies to Schrodingerhttp://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat). -gregg

On 2009 Mar 5, at 8:08, Simon Marlow wrote:
So the argument is something like: we can think of the result of a call to unsafeInterleaveIO as having been chosen at the time we called unsafeInterleaveIO, rather than when its result is actually evaluated. This is on dodgy ground, IMO: either you admit that the IO monad contains an Oracle, or you admit it can time-travel. I don't believe in either of those things :-)
...hasn't sigfpe demonstrated "time travel" using comonads? -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Thu, 2009-03-05 at 13:08 +0000, Simon Marlow wrote:
Lennart Augustsson wrote:
I don't see any breaking of referential transparence in your code. Every time you do an IO operation the result is basically non-deterministic since you are talking to the outside world. You're assuming the IO has some kind of semantics that Haskell makes no promises about.
I'm not saying that this isn't a problem, because it is. But it doesn't break referential transparency, it just makes the semantics of IO even more complicated.
(I don't have a formal proof that unsafeInterleaveIO cannot break RT, but I've not seen an example where it does yet.)
So the argument is something like: we can think of the result of a call to unsafeInterleaveIO as having been chosen at the time we called unsafeInterleaveIO, rather than when its result is actually evaluated.
But surely that is the mistake. With IO, bound values depend solely on the effects that happened previously (no time travel). With ordinary sequential IO, that's easy to understand and explain because the binding and the effects happen at the same time/place. The extension to unsafeInterleaveIO is not to pretend that the value is chosen at the point it is bound. The value depends on the effects not the binding. The extension is that we separate the binding point and the effects. The effects can be arbitrarily interleaved with other effects. The value we get does potentially depend on all those prior effects. The arbitrary interleaving of effects is obviously weaker than them happening now, in sequence, before the next action. This works well when the deferred side effect do not interfere with other side effects and is non-deterministic when they do interfere, as in Oleg's example. It doesn't break referential transparency though, thanks to memoisation we only observe one final value. The standard libs, being sensible, mostly only use unsafeInterleaveIO in the cases where we expect little interference. Note how Oleg has to go out of his way to construct the example, circumventing the semi-closed Handle state which was designed to stop people from shooting themselves in the foot. There used to be a real example where unsafeInterleaveIO did break referential transparency. That was due to concurrency and lack of locking. http://hackage.haskell.org/trac/ghc/ticket/986 When the bug was around there was a *single* pure value that if used by two different threads could be observed to be different. There's nothing similar happening here. Duncan
participants (6)
-
Brandon S. Allbery KF8NH
-
Duncan Coutts
-
Gregg Reynolds
-
Jonathan Cast
-
Lennart Augustsson
-
Simon Marlow