Re: [Haskell] Re: Global Variables and IO initializers

[moving to haskell-cafe] Sorry for the long post. On Sunday 07 November 2004 22:55, Adrian Hey wrote:
On Sunday 07 Nov 2004 1:45 pm, Benjamin Franksen wrote:
It's a similar advantage as using the IO monad has over allowing arbitrary side-effects in functions: The IO monad gives you a clear separation between stuff that has (side-) effects (i.e. depends on the real word) and pure functions (which don't). Abandoning global variables gives you a clear separation of stuff that depends on initialized state and other stuff that does not depend on it.
I don't agree. Hidden dependencies are a fact of life with stateful programming in general and IO monad in particular. Making some references explicit arguments (as you seem to be suggesting) does not eliminate the problem, it merely complicates an api for no good reason.
You have point here: hidden dependencies are something that is inherently possible in the IO monad. You can for instance easily create global variables using the FFI without resorting to unsafePerformIO. I'll take back what I said above. But I maintain that it is a good idea to avoid hiding dependencies if possible.
Hiding internal state dependencies is a *good thing*. The trick is organise the dependencies and provide a robust "idiot proof" api so that users don't have to know about the internal organisation and any dependencies.
Oh, but the user *has* to know about them. The user must call the init routine before using otehr routines of the library, remember? Why are you against the type checker reminding her? I know a lot of those "idiot proof" libraries: "You need to call X then call Y but not if Z was called before..." One of the ideas behind using functions with arguments and a static type system is to encode dependencies so that the compiler can enforce them. And BTW what if your idiot proof initialization routine needs arguments to configure the library? Is the user still allowed to call it from several places in his code, now with possibly different arguments? And with what effect?
I don't believe this is a new (or controversial) idea. Its the basic idea behind stateful modular or OO programming. All the user sees is a set of actions which collectively deliver on a promise (by unknown means).
OO is the best argument *against* global variables. Pure OO languages have *no* hidden global state. In every real OO programm you have the dependency explicit, since you always need a "target" object on which to invoke methods. It doesn't matter that you write "object.f" instead of "f object" as you would in Haskell. I have never heard anyone using an OO language complain about that. The two best OO languages I know of are Eiffel and O'Haskell/Timber. Both do not have global variables. Eiffel has 'once' routines which seem similar to be what you are after. Timber doesn't even have top-level IO actions, instead everything you need from the environment is given as an argument to main. Mark that Timber is used for real-time control, an inherently stateful and IO intensive field. Your opinion that it automatically leads to a horrible API if you have to pass the initialized state around amounts to saying that in an OO language like Eiffel only libraries with horribly inconvenient APIs can be written. This is ridiculous. Even in C++ using global variables is nowadays generally regarded as bad design, especially for libraries.
[...] You know that IO actions have (side-) effects, so you would take care that the actions get executed as many times as is apropriate. If the library docs indicate that it makes no sense to call it twice, why would you do so?
Given such a statement about realInit you wouldn't (or to be more precise, given a statement that calling it twice or more will really screw things up).
I would be really interested to know what kind of init action you are talking about, that so badly screws everything up if called twice. This is not rethoric, I mean it.
But the question is *how* is the user to ensure that it is only called once. I see no other way than the darned awkward alternative I gave.
We have an interesting patt situation here: You argue that you want a feature so that you can enforce that a routine is called *at most* once. I argue that if you do this by hiding state dependencies, you are loosing the ability to enforce that it is called *at least* once. You argue that it might be catastrophic if the library initialized more than once. I argue that it is usually catastrophic (with this I mean core dump or at least exception if it is programmed defensively) if you don't initialize it at all.
I suppose the other alternative is the noddy realInit is only used once in an action which is only used once, in an action .. from main (which is only used once hopefully). Is this what you have in mind?
It's the same patt as above: If you do it your way, you have the problem with ensuring that it gets called at least once before you call routines that depend on it. And that gets *really* hard as soon as you have concurrent threads. Maybe we should look for a solution that can enforce *both* invariants, "at least once" as well as "at most once"? Its only that I can't see such a solution and therefore my preferences would be to redesign 'realInit' in such a way that calling it twice is not fatal but just creates another 'instance' (can't be more specific without knowing what the library does).
The behaviour of (and consequent constraints on correct useage of) realInit and putString are very different. Must I eloborate them?
True, I can't see any constraints on correct usage of 'putString' that aren't enforced by the type checker. And that is exactly how it should be. Maybe the problem with your 'realInit' is that it needs such constraints? Again, giving an example might convince me that these constraints are inherent to the problem domain and can't be worked around.
It doesn't seem very attractive to users either (considerably complicates their code and places the burden on them to "get it right").
It may seem so at first, but I think it's a delusion.
Trust me on this, for whatever reason, it's absolutely vital that realInit is used 0 or 1 times only, 2 or more is a catastrophic error.
I would very much like to trust you, but why can't you give us an example? Are you talking about misssion-critical stuff like controlling an airplane? But you don't initialize a library in full flight, do you? So why is it catastrophic and what exactly does that mean? I thought you mean core cump, but I am no longer sure... Maybe the reason is that it calls out to C libraries with a broken API? (I know of enough such libraries, and interfacing them in a clean manner is sometimes a pain in the ass.)
So I'll ask again. Please provide a simpler _and_ safer alternative (some real Haskell code please).
And I'll ask again for an example to convince me of the necessity.
At the moment I cannot imagine a well designed library interface where user code would be considerably complicated if no global variables were used. But maybe you have a good example at hand to prove that this is merely due to lack of imagination on my side, and that I was extremely lucky with the HWS? ;-)
Indeed, I believe this is the case. I'm guessing of course, but I imagine all your IO is done via standard Haskell library calls (socket API or whatever), in which case they will hide a lot of the stateful compexity of their implementation already.
I don't know about the latter. I do know that there are no constraints on usage in the form of "this must be called before that", besides the ones automatically enforced by the type system. An exception might be the posix libraries, but they are only a thin layer over a badly designed C API. I could be wrong, but I doubt that there is lots of hidden state in the Haskell part. I once wrote a Haskell binding to a C library for a special network protocol. I never even considered using unsafePerformIO except for C routines that were actually pure functions. What I *did* need to consider and work around was that the C API was in some places hiding global state, which was *very* bad. Another example: Have you ever been using ONC/RPC (Remote Procedure Call)? I saw implementations that came with a real-time multithreaded OS where the docs said, more or less: "All created objects such as client handle may only be used from the thread that created them." *That* is a horrible API, because it means you can not pass these objects around freely but have to make sure your routine isn't called from the "wrong" thread! And the reason for this restriction was (of course) that the library was hiding state inside thread-local variables.
If so it seems to me you're using the fact that somebody has already solved the problem for you as an argument that no solution is necessary.
Maybe. We can talk in "if" sentences until we both die of old age.
(It would be interesting to see what the api's of the libraries you're using would look like, if they had been designed according to the principles you're advocating).
Yes, that would be interesting. And it is not a matter of me holding up holy principles against an evil reality. I am talking about practical considerations, not ideals. I hope I've made that clear with the above examples. Cheers, Ben

Benjamin Franksen writes:
Even in C++ using global variables is nowadays generally regarded as bad design, especially for libraries.
Well, to be fair one has to say that they are still quite popular although people call them "singletons" and other cute things these days. I distinctly remember reading hundreds and hundreds of articles which explained in great detail how to create and use them in slightly less than 200 lines of template meta-programming code without making the compiler explode and still getting the result you expected almost half of the time. Of course, if you used multi-threading it all exploded nonetheless then. So the discussion about global IO initializers in Haskell is slightly reminiscent of that for me. The argument in favor of global variables usually was that it was "more comfortable", which means that using them saved you a total of 20 seconds per-module because you had a parameter less to type every now and then, at the small cost of making your code almost unmaintainable in the long run. Frankly, the idea that anyone would want to jump through hoops to add them to a purely functional language sounds bizarre to me. But by all means, as long as the compiler extension is disabled per default I won't mind. :-) Peter

Peter Simons wrote:
[Global variables]
Well, to be fair one has to say that they are still quite popular although people call them "singletons" and other cute things these days.
Frankly, the idea that anyone would want to jump through hoops to add them to a purely functional language sounds bizarre to me. But by all means, as long as the compiler extension is disabled per default I won't mind. :-)
A singleton doesn't nessesarally have to be "mutable". The idea is that the first time you use a singleton the object tied to that variable gets created, then all subsequent calls to it share the same object. A Constant Applicative Form (CAF) (an expression defined at top level which has zero airity) in a pure functional language can be implemented in the same way. If you implement it as a singleton, it doesn't have to be created when its not needed. All your top level library functions are also "global", you don't pass them into every function that uses them do you? "Mutable" global state is another matter. It's sometimes well used and sometimes not. Just because you go and wrap a monad around it doesn't make it less mutable. Ben.

On Monday 08 Nov 2004 6:00 am, Peter Simons wrote:
Frankly, the idea that anyone would want to jump through hoops to add them to a purely functional language sounds bizarre to me.
The first step to solving a problem is to at least recognise that it exists. What is "bizarre" is that so many folk seem to be in denial over this. Perhaps you would like to show me your solution to the "oneShot" problem. If this is such a wacky idea then why is the use of the unsafePerformIO hack to do precisely this so common place? I gather it's even used within ghc. If the two Simons don't know how to write "proper" Haskell, what hope is there for the rest of us. Also a few more points that seem to need repeating.. 1- We're talking about the general problem of creating top level "things with identity" (does anyone have a less cumbersome term?) 2- Creating top-level mutable variables (IORefs) is just one utterly trivial use of this capability. 3- Top-level does not imply global. 4- They already exist (stdin,stout,stderr) and I don't recall anybody ever complaining about this. 5- The above are already *implicitly* referenced by many other commonly used top level IO related functions.
But by all means, as long as the compiler extension is disabled per default I won't mind. :-)
No doubt it would be, like all non-standard extensions. But why would it be a problem if it was not? If you don't want to use <- bindings then don't. Nothing else has changed. Regards -- Adrian Hey

Adrian Hey wrote:
The first step to solving a problem is to at least recognise that it exists. What is "bizarre" is that so many folk seem to be in denial over this. Perhaps you would like to show me your solution to the "oneShot" problem.
Why are you unable to give a concrete real world example of why this is necessary then. Even your example of real world hardware that must be initialised once fails! (What if I start two copies of the program?) With this example the only satesfactory solution if for the hardware itself to keep track of when it is initialised. If the hardware has a "I have been inititalsed" flag, the init routine would check this flag as its first action and exit should initialisation already have taken place. Any other solution is broken in a multi-threaded environment (or even a single-threaded one in which multiple exexutions of the same program are possible like DOS). Keean.

Keean Schupke wrote:
Adrian Hey wrote:
The first step to solving a problem is to at least recognise that it exists. What is "bizarre" is that so many folk seem to be in denial over this. Perhaps you would like to show me your solution to the "oneShot" problem.
Why are you unable to give a concrete real world example of why this is necessary then. Even your example of real world hardware that must be initialised once fails! (What if I start two copies of the program?)
Indeed. With hardware the solution is to do hdl <- openDevice which will succeed the first time and then return "busy" until closed. Any access to the device must use the hdl. Trying to do without the handle is just shooting yourself in the foot. It might look good at first, but it doesn't scale. -- Lennart

On Monday 08 Nov 2004 12:26 pm, Lennart Augustsson wrote:
Keean Schupke wrote:
Adrian Hey wrote:
The first step to solving a problem is to at least recognise that it exists. What is "bizarre" is that so many folk seem to be in denial over this. Perhaps you would like to show me your solution to the "oneShot" problem.
Why are you unable to give a concrete real world example of why this is necessary then. Even your example of real world hardware that must be initialised once fails! (What if I start two copies of the program?)
Indeed. With hardware the solution is to do hdl <- openDevice which will succeed the first time and then return "busy" until closed.
How will it know it's "busy"? Please show me the code for your hypothetical openDevice. Regards -- Adrian Hey

On Monday 08 Nov 2004 10:37 am, Keean Schupke wrote:
Adrian Hey wrote:
The first step to solving a problem is to at least recognise that it exists. What is "bizarre" is that so many folk seem to be in denial over this. Perhaps you would like to show me your solution to the "oneShot" problem.
Why are you unable to give a concrete real world example of why this is necessary then.
Because it is irrelevant, unless you think I'm lying. It is suffices merely to state the problem. If you want an answer see my reply to Keith. Regards -- Adrian Hey

Adrian Hey writes:
Perhaps you would like to show me your solution to the "oneShot" problem.
I don't see any value in problems that are specifically designed so that they can be solved only with a global entity. What is the real-world application for oneShot?
If this is such a wacky idea then why is the use of the unsafePerformIO hack to do precisely this so common place?
Because programmers tend to be lazy. I like Haskell because it doesn't _allow_ me to be lazy.
I gather it's even used within ghc. If the two Simons don't know how to write "proper" Haskell, what hope is there for the rest of us.
Nobody said that. Use of unsafePerformIO does not equal bad code.
But why would it be a problem if it was not?
Because code like that is very hard to get right and very hard to maintain, and I don't want to use library code that uses this technique if I can avoid it. I'll be readily convinced of the opposite once I see code that makes good use of this. Peter

On Monday 08 Nov 2004 11:58 am, Peter Simons wrote:
Adrian Hey writes:
Perhaps you would like to show me your solution to the "oneShot" problem.
I don't see any value in problems that are specifically designed so that they can be solved only with a global entity.
Why not? Even if it was true that I had "specifically designed" this problem, it's existance is of some interest I think.
What is the real-world application for oneShot?
See my response to Keith.
I gather it's even used within ghc. If the two Simons don't know how to write "proper" Haskell, what hope is there for the rest of us.
Nobody said that. Use of unsafePerformIO does not equal bad code.
Nor did I accuse anyone of this. In this thread we're talking one specific use of unsafePerformIO to create top level "things with identity" (I think I'll call them TWI's from now on). This is what many in this thread assert is bad code, yourself included it seems. Yet this is widely used in many programs and libraries, even in ghc itself I believe. Not to mention stdin etc.. (again).
But why would it be a problem if it was not?
Because code like that is very hard to get right and very hard to maintain, and I don't want to use library code that uses this technique if I can avoid it.
This is dogma I think. There are many libraries you will need to try to avoid using if this is really your position. Regards -- Adrian Hey

Adrian Hey writes:
I don't see any value in problems that are specifically designed so that they can be solved only with a global entity.
Even if it was true that I had "specifically designed" this problem, it's existance is of some interest I think.
Perhaps my choice of words wasn't really good. I am sorry. What I meant to say is that I have never once _needed_ a global variable yet. Never. On the other hand, there were plenty of occasions where I had trouble with global variables in other people's code. I'll readily admit that a safe way to implement them in Haskell is probably an interesting research subject, but I honestly don't expect to be using that feature any time soon. It's a completely abstract concept for me; I associate no practical value with it.
[Creating top level "things with identity" is] what many in this thread assert is bad code, yourself included it seems. Yet this is widely used in many programs and libraries, even in ghc itself I believe. Not to mention stdin etc.. (again).
Right, but it is by no means _necessary_ to have a global 'stdin'. It could equally well be defined as stdin :: IO Handle and it would work just the same. The fact that it isn't implemented this way is for historical reasons, IMHO, not because it's a good idea.
Because code like that is very hard to get right and very hard to maintain, and I don't want to use library code that uses this technique if I can avoid it.
This is dogma I think.
Yes, you are right. Nonetheless, it is a dogma that's not just arbitrary; it is motivated by experience with real-life code. Just ask the C++ folks about the wonders of global variables that are actually complex classes with a constructor and a destructor. You wouldn't believe through what kind of hoops you have to jump if you want to write reliable code that has to deal with this. For instance: Where do you catch exceptions a constructor throws that is executed before your main() routine is? How do you deal with exceptions that are thrown after your main routine has _ended_? The effect is that the language is full of strange and very counter-intuitive mechanisms, just so that they can implement something which -- in _my_ opinion -- is completely useless to begin with!
There are many libraries you will need to try to avoid using if this is really your position.
Let's say ... I try to compromise rarely, but I do have to compromise, unfortunately. In fact, I have to admit that my own Haskell code contains unsafePerformIO at the occasion, too. Not that I'd need it, but I am too damn lazy as well. :-) Peter

Quoting Peter Simons
Just ask the C++ folks about the wonders of global variables that are actually complex classes with a constructor and a destructor.
You can't use that as an argument against global variables in other languages. -- Jeff

jeff writes:
Just ask the C++ folks about the wonders of global variables that are actually complex classes with a constructor and a destructor.
You can't use that as an argument against global variables in other languages.
Why not? Does the creation of global variables never fail in Haskell? Besides, my main point is that they are _unnecessary_ in my experience, not that it were impossible to implement them. Peter

Quoting Peter Simons
jeff writes:
Just ask the C++ folks about the wonders of global variables that are actually complex classes with a constructor and a destructor.
You can't use that as an argument against global variables in other languages.
Why not?
So what if there are problems with globals that are actually complex classes etc in C++? Why should that matter to anyone using any other language?
Does the creation of global variables never fail in Haskell?
That's a different argument, not based on C++.
Besides, my main point is that they are _unnecessary_ in my experience,
Ok, but that's again not the C++ argument (which was all that I was addressing). -- Jeff

Adrian Hey wrote:
4- They already exist (stdin,stout,stderr) and I don't recall anybody ever complaining about this.
stdin, stdout, and stderr are not global variables. They are just handles. One possible implementation of handles is as an Int. So stdin is no more a global variable than 0. Of course you need some state associated with the handle, but that state does not have to be a unique global things. You are passing that state around via the IO monad, and there could be multiple versions of it. GHC chooses to implement it differently, but that's a choice. -- Lennart

On 8 Nov 2004, at 12:23, Lennart Augustsson wrote:
Adrian Hey wrote:
4- They already exist (stdin,stout,stderr) and I don't recall anybody ever complaining about this.
stdin, stdout, and stderr are not global variables. They are just handles. One possible implementation of handles is as an Int. So stdin is no more a global variable than 0. Of course you need some state associated with the handle, but that state does not have to be a unique global things. You are passing that state around via the IO monad, and there could be multiple versions of it. GHC chooses to implement it differently, but that's a choice.
Yes... a lot of the example we have seen here are 'just' handles. newIORef creates handles. Something many programmers would like is the ability to create fresh handles at the toplevel... Jules

Jules Bean wrote:
Yes... a lot of the example we have seen here are 'just' handles. newIORef creates handles. Something many programmers would like is the ability to create fresh handles at the toplevel...
Yes, I hear what they want. That doesn't mean I think it's a good idea. Top level things with identity are evil. :) -- Lennart
participants (8)
-
Adrian Hey
-
Ben Lippmeier
-
Benjamin Franksen
-
jeff@inf.ed.ac.uk
-
Jules Bean
-
Keean Schupke
-
Lennart Augustsson
-
Peter Simons