
[moving to haskell-cafe] Sorry for the long post. On Sunday 07 November 2004 22:55, Adrian Hey wrote:
On Sunday 07 Nov 2004 1:45 pm, Benjamin Franksen wrote:
It's a similar advantage as using the IO monad has over allowing arbitrary side-effects in functions: The IO monad gives you a clear separation between stuff that has (side-) effects (i.e. depends on the real word) and pure functions (which don't). Abandoning global variables gives you a clear separation of stuff that depends on initialized state and other stuff that does not depend on it.
I don't agree. Hidden dependencies are a fact of life with stateful programming in general and IO monad in particular. Making some references explicit arguments (as you seem to be suggesting) does not eliminate the problem, it merely complicates an api for no good reason.
You have point here: hidden dependencies are something that is inherently possible in the IO monad. You can for instance easily create global variables using the FFI without resorting to unsafePerformIO. I'll take back what I said above. But I maintain that it is a good idea to avoid hiding dependencies if possible.
Hiding internal state dependencies is a *good thing*. The trick is organise the dependencies and provide a robust "idiot proof" api so that users don't have to know about the internal organisation and any dependencies.
Oh, but the user *has* to know about them. The user must call the init routine before using otehr routines of the library, remember? Why are you against the type checker reminding her? I know a lot of those "idiot proof" libraries: "You need to call X then call Y but not if Z was called before..." One of the ideas behind using functions with arguments and a static type system is to encode dependencies so that the compiler can enforce them. And BTW what if your idiot proof initialization routine needs arguments to configure the library? Is the user still allowed to call it from several places in his code, now with possibly different arguments? And with what effect?
I don't believe this is a new (or controversial) idea. Its the basic idea behind stateful modular or OO programming. All the user sees is a set of actions which collectively deliver on a promise (by unknown means).
OO is the best argument *against* global variables. Pure OO languages have *no* hidden global state. In every real OO programm you have the dependency explicit, since you always need a "target" object on which to invoke methods. It doesn't matter that you write "object.f" instead of "f object" as you would in Haskell. I have never heard anyone using an OO language complain about that. The two best OO languages I know of are Eiffel and O'Haskell/Timber. Both do not have global variables. Eiffel has 'once' routines which seem similar to be what you are after. Timber doesn't even have top-level IO actions, instead everything you need from the environment is given as an argument to main. Mark that Timber is used for real-time control, an inherently stateful and IO intensive field. Your opinion that it automatically leads to a horrible API if you have to pass the initialized state around amounts to saying that in an OO language like Eiffel only libraries with horribly inconvenient APIs can be written. This is ridiculous. Even in C++ using global variables is nowadays generally regarded as bad design, especially for libraries.
[...] You know that IO actions have (side-) effects, so you would take care that the actions get executed as many times as is apropriate. If the library docs indicate that it makes no sense to call it twice, why would you do so?
Given such a statement about realInit you wouldn't (or to be more precise, given a statement that calling it twice or more will really screw things up).
I would be really interested to know what kind of init action you are talking about, that so badly screws everything up if called twice. This is not rethoric, I mean it.
But the question is *how* is the user to ensure that it is only called once. I see no other way than the darned awkward alternative I gave.
We have an interesting patt situation here: You argue that you want a feature so that you can enforce that a routine is called *at most* once. I argue that if you do this by hiding state dependencies, you are loosing the ability to enforce that it is called *at least* once. You argue that it might be catastrophic if the library initialized more than once. I argue that it is usually catastrophic (with this I mean core dump or at least exception if it is programmed defensively) if you don't initialize it at all.
I suppose the other alternative is the noddy realInit is only used once in an action which is only used once, in an action .. from main (which is only used once hopefully). Is this what you have in mind?
It's the same patt as above: If you do it your way, you have the problem with ensuring that it gets called at least once before you call routines that depend on it. And that gets *really* hard as soon as you have concurrent threads. Maybe we should look for a solution that can enforce *both* invariants, "at least once" as well as "at most once"? Its only that I can't see such a solution and therefore my preferences would be to redesign 'realInit' in such a way that calling it twice is not fatal but just creates another 'instance' (can't be more specific without knowing what the library does).
The behaviour of (and consequent constraints on correct useage of) realInit and putString are very different. Must I eloborate them?
True, I can't see any constraints on correct usage of 'putString' that aren't enforced by the type checker. And that is exactly how it should be. Maybe the problem with your 'realInit' is that it needs such constraints? Again, giving an example might convince me that these constraints are inherent to the problem domain and can't be worked around.
It doesn't seem very attractive to users either (considerably complicates their code and places the burden on them to "get it right").
It may seem so at first, but I think it's a delusion.
Trust me on this, for whatever reason, it's absolutely vital that realInit is used 0 or 1 times only, 2 or more is a catastrophic error.
I would very much like to trust you, but why can't you give us an example? Are you talking about misssion-critical stuff like controlling an airplane? But you don't initialize a library in full flight, do you? So why is it catastrophic and what exactly does that mean? I thought you mean core cump, but I am no longer sure... Maybe the reason is that it calls out to C libraries with a broken API? (I know of enough such libraries, and interfacing them in a clean manner is sometimes a pain in the ass.)
So I'll ask again. Please provide a simpler _and_ safer alternative (some real Haskell code please).
And I'll ask again for an example to convince me of the necessity.
At the moment I cannot imagine a well designed library interface where user code would be considerably complicated if no global variables were used. But maybe you have a good example at hand to prove that this is merely due to lack of imagination on my side, and that I was extremely lucky with the HWS? ;-)
Indeed, I believe this is the case. I'm guessing of course, but I imagine all your IO is done via standard Haskell library calls (socket API or whatever), in which case they will hide a lot of the stateful compexity of their implementation already.
I don't know about the latter. I do know that there are no constraints on usage in the form of "this must be called before that", besides the ones automatically enforced by the type system. An exception might be the posix libraries, but they are only a thin layer over a badly designed C API. I could be wrong, but I doubt that there is lots of hidden state in the Haskell part. I once wrote a Haskell binding to a C library for a special network protocol. I never even considered using unsafePerformIO except for C routines that were actually pure functions. What I *did* need to consider and work around was that the C API was in some places hiding global state, which was *very* bad. Another example: Have you ever been using ONC/RPC (Remote Procedure Call)? I saw implementations that came with a real-time multithreaded OS where the docs said, more or less: "All created objects such as client handle may only be used from the thread that created them." *That* is a horrible API, because it means you can not pass these objects around freely but have to make sure your routine isn't called from the "wrong" thread! And the reason for this restriction was (of course) that the library was hiding state inside thread-local variables.
If so it seems to me you're using the fact that somebody has already solved the problem for you as an argument that no solution is necessary.
Maybe. We can talk in "if" sentences until we both die of old age.
(It would be interesting to see what the api's of the libraries you're using would look like, if they had been designed according to the principles you're advocating).
Yes, that would be interesting. And it is not a matter of me holding up holy principles against an evil reality. I am talking about practical considerations, not ideals. I hope I've made that clear with the above examples. Cheers, Ben