"pure" versus "impure" code

Hi Folks, Is there a web site that defines "pure" versus "impure" code? My understanding is that: - Pure code is code that does no I/O - Impure code is code that does I/O Is that correct? Also, it is my understanding that good software design is to isolate/separate the impure code from the pure code. Is that correct? Does that principle apply to all programming languages, or just Haskell? /Roger

On 19 May 2011 22:12, Costello, Roger L.
Is there a web site that defines "pure" versus "impure" code?
Lots of pages: http://en.wikipedia.org/wiki/Pure_function http://en.wikipedia.org/wiki/Referential_transparency_(computer_science) http://en.wikipedia.org/wiki/Idempotence#In_computing
- Pure code is code that does no I/O
- Impure code is code that does I/O
Is that correct?
That's roughly right for a sufficiently greedy definition of "IO".
Also, it is my understanding that good software design is to isolate/separate the impure code from the pure code. Is that correct?
That's the idea. Whether it's correct is up for debate, but that's what Haskellers believe. Pure code is inherently easier to compose, reason about and change. It makes sense to make most of your program pure, especially important logic. Here's an example: This is an IRC (chat) server. This module is impure: https://github.com/chrisdone/hulk/blob/master/src/Hulk/Server.hs This module is pure: https://github.com/chrisdone/hulk/blob/master/src/Hulk/Client.hs Think Pinky and the Brain. Pure code is the brain. Impure code is pinky.
Does that principle apply to all programming languages, or just Haskell?
Haskellers apply this principle wherever they can.

On May 19, 2011, at 1:12 PM, Costello, Roger L. wrote:
Also, it is my understanding that good software design is to isolate/separate the impure code from the pure code. Is that correct? Does that principle apply to all programming languages, or just Haskell?
(Since you asked about other languages the code below is pseudo Python, but Haskell, C, whatever would still work) The question is why you would want to make this separation. At first blush it sounds arbitrary. In fact for writing code quickly in a "throw away" mode that kind of separation is over kill in other languages. But consider code you might want to keep for a few years. Let's consider a typical hacking problem. You need to read from a file and do something. The something is not all that important. open file for line in file: # parse / edit / print / something close file Now let's say you have this chunk of code in the middle of some function calls. Now to test this code you have to have access to a filesystem with the file ready to go. What if you just checked out the code from revision control? Guess you need to checkout the test data too. Bummer. If we abstract out the concept of reading from a file we could provide a fake file interface and still test our code. You can imagine how this would help test networking code and other types of I/O. In haskell this can be accomplished by hiding a Handle in your own type and for testing providing a different Handle which is not bound to I/O. But you can also separate the code which deals with the result of the input from the act of doing the I/O. open file input = [] for line in file: input.append(line) close file # do something with input over here Now we can test all of the code which simply acts on the input data. We can provide our own canned data, etc. Due to the IO Monad this is a bigger concern for Haskell but it is good programming practice in any language.

On 19 May 2011, at 21:12, Costello, Roger L. wrote:
My understanding is that:
- Pure code is code that does no I/O
- Impure code is code that does I/O
I don't particularly like that definition. It's true, but the definition of I/O has to be very board. The definition I use is that a pure function is one that - takes input only from it's parameters. - outputs only via it's return value. If x is an external variable in memory Pure : f(b,c) = b + c Impure: f(b) = b + x; (violates first rule - Uses value in x) Impure: f(b,c) = 1 ; x = b + c; (violates second rule - Changes value in x) Pure functions can combine to make larger pure functions. Impure functions combined with pure functions make larger impure functions.
Also, it is my understanding that good software design is to isolate/separate the impure code from the pure code. Is that correct?
As the result of pure code is dependent only on it's inputs you can make statements like: f(1,2) is always is equal to f(1,2) at any time ...without knowing what function f() is. That's quite powerful, and can be useful in optimisation. Another way of phrasing it - A pure piece of code causes no side-effects, and it is not effected by other code's side-effects. This property is very useful in parallel computations. Separating pure code out can be useful, but most programs in most languages don't. That's mainly due to the use of shared state variables between functions. Object orientation is inherently impure as every object has retained state, modified and used by the methods. This isn't necessarily a bad thing. At some point all programs have to be impure otherwise they'll always calculate the same result.
Does that principle apply to all programming languages, or just Haskell?
Yes, it applies to all languages.

The definition of pure code is very simple. Pure code always returns the same value given the same arguments. The issue comes in the subtleties of arguments of the IO monad. Functions that produce IO actions are pure – they always produce the same IO action given the same arguments. What's not pure is the runtime which then interprets the IO action. The result of this is that you only get the nice guarantees that purity gives you when considering which IO action is produced, not when considering the effects of the IO action being executed by the runtime. Bob

Paul Sargent
On 19 May 2011, at 21:12, Costello, Roger L. wrote:
My understanding is that:
- Pure code is code that does no I/O
- Impure code is code that does I/O
I don't particularly like that definition. It's true, but the definition of I/O has to be very board.
I would say so! I'm used to "Pure functions don't change state", where IO is pretty much a change of state by definition.
The definition I use is that a pure function is one that - takes input only from it's parameters. - outputs only via it's return value.
Not quite mine, but I think it's better for reasoning about Haskell code. Maybe - I have a question at the end.
If x is an external variable in memory Pure : f(b,c) = b + c Impure: f(b) = b + x; (violates first rule - Uses value in x)
Right, but ... see the question at the end.
Impure: f(b,c) = 1 ; x = b + c; (violates second rule - Changes value in x)
Pure functions can combine to make larger pure functions. Impure functions combined with pure functions make larger impure functions.
Also, it is my understanding that good software design is to isolate/separate the impure code from the pure code. Is that correct? As the result of pure code is dependent only on it's inputs you can make statements like: f(1,2) is always is equal to f(1,2) at any time ...without knowing what function f() is. That's quite powerful, and can be useful in optimisation.
Among other places! The point of this kindof distinction is that it's easier to reason about "pure" code - because you can make statements like that about them.
Another way of phrasing it - A pure piece of code causes no side-effects, and it is not effected by other code's side-effects. This property is very useful in parallel computations.
Yup. And reasoning about parallel computation is hard enough as it is.
Separating pure code out can be useful, but most programs in most languages don't. That's mainly due to the use of shared state variables between functions. Object orientation is inherently impure as every object has retained state, modified and used by the methods. This isn't necessarily a bad thing. At some point all programs have to be impure otherwise they'll always calculate the same result.
Does that principle apply to all programming languages, or just Haskell?
Yes, it applies to all languages.
But in different ways to different languages. Look at the "doesn't change state" definition - basically, yours without the "depends on state that other code can change" restriction. That means that you can safely ignore the function when analyzing the calling code - you know it doesn't change any state. If the langauge is greedy, an optimizer can safely inline the function at that point. If the language is lazy, then the optimizer can't do the inlining, because the external state may change between the function invocation and actually evaluating the function, which would change the results of the call. In other words, in a greedy language you might get more leverage out of the looser definition of "pure." In your example, x - a value that's not a paramenter - is conventionally known as a free variable. IIUC, in Haskell, names can't change their value unless they're in a monad. Which means the two different definitions you gave for "pure" describe different types of functions: A function which takes input only from it's parameters has no free variables. On the other hand, a function which uses free variables that aren't in a monad can't be affected by other code's side-effects, even though it takes input other than it's parameters (at least if you have sane scoping rules). Hence the question: which of these is more useful for Haskell? -- Sent from my Android tablet with K-9 Mail. Please excuse my brevity.

Hi Folks, Thanks to everyone who responded to my questions. Excellent! I have learned a lot. It is a fascinating subject. I summarized what I learned (MS Word document): http://xfront.com/Pure-versus-Impure-Code.docx Please let me know of any errors that I made. /Roger
participants (6)
-
Christopher Done
-
Costello, Roger L.
-
mike.w.meyer@gmail.com
-
Paul Sargent
-
Sean Perry
-
Thomas Davie