[Haskell-cafe] Re: [Haskell] Re: Global Variables and IO initializers

8 Nov 2004

      [moving to haskell-cafe]

Sorry for the long post.

On Sunday 07 November 2004 22:55, Adrian Hey wrote:
...
On Sunday 07 Nov 2004 1:45 pm, Benjamin Franksen wrote:
...
It's a similar advantage as using the IO monad has over allowing
arbitrary side-effects in functions: The IO monad gives you a clear
separation between stuff that has (side-) effects (i.e. depends on the
real word) and pure functions (which don't). Abandoning global variables
gives you a clear separation of stuff that depends on initialized state
and other stuff that does not depend on it.
I don't agree. Hidden dependencies are a fact of life with stateful
programming in general and IO monad in particular. Making some
references explicit arguments (as you seem to be suggesting) does
not eliminate the problem, it merely complicates an api for no good
reason.
You have point here: hidden dependencies are something that is inherently 
possible in the IO monad. You can for instance easily create global variables 
using the FFI without resorting to unsafePerformIO. I'll take back what I 
said above. But I maintain that it is a good idea to avoid hiding 
dependencies if possible.
...
Hiding internal state dependencies is a *good thing*. The trick is
organise the dependencies and provide a robust "idiot proof" api so
that users don't have to know about the internal organisation and any
dependencies.
Oh, but the user *has* to know about them. The user must call the init routine 
before using otehr routines of the library, remember? Why are you against the 
type checker reminding her?

I know a lot of those "idiot proof" libraries: "You need to call X then call Y 
but not if Z was called before..." One of the ideas behind using functions 
with arguments and a static type system is to encode dependencies so that the 
compiler can enforce them.

And BTW what if your idiot proof initialization routine needs arguments to 
configure the library? Is the user still allowed to call it from several 
places in his code, now with possibly different arguments? And with what 
effect?
...
I don't believe this is a new (or controversial) idea.
Its the basic idea behind stateful modular or OO programming.
All the user sees is a set of actions which collectively deliver on
a promise (by unknown means).
OO is the best argument *against* global variables. Pure OO languages have 
*no* hidden global state. In every real OO programm you have the dependency 
explicit, since you always need a "target" object on which to invoke methods. 
It doesn't matter that you write "object.f" instead of "f object" as you 
would in Haskell. I have never heard anyone using an OO language complain 
about that.

The two best OO languages I know of are Eiffel and O'Haskell/Timber. Both do 
not have global variables. Eiffel has 'once' routines which seem similar to 
be what you are after. Timber doesn't even have top-level IO actions, instead 
everything you need from the environment is given as an argument to main. 
Mark that Timber is used for real-time control, an inherently stateful and IO 
intensive field.

Your opinion that it automatically leads to a horrible API if you have to pass 
the initialized state around amounts to saying that in an OO language like 
Eiffel only libraries with horribly inconvenient APIs can be written. This is 
ridiculous.

Even in C++ using global variables is nowadays generally regarded as bad 
design, especially for libraries.
...
...
[...] You know that IO actions have (side-) effects, so you
would take care that the actions get executed as many times as is
apropriate. If the library docs indicate that it makes no sense to call
it twice, why would you do so?
Given such a statement about realInit you wouldn't (or to be more precise,
given a statement that calling it twice or more will really screw things
up).
I would be really interested to know what kind of init action you are talking 
about, that so badly screws everything up if called twice. This is not 
rethoric, I mean it.
...
But the question is *how* is the user to ensure that it is only called
once. I see no other way than the darned awkward alternative I gave.
We have an interesting patt situation here: You argue that you want a feature 
so that you can enforce that a routine is called *at most* once. I argue that 
if you do this by hiding state dependencies, you are loosing the ability to 
enforce that it is called *at least* once.

You argue that it might be catastrophic if the library initialized more than 
once. I argue that it is usually catastrophic (with this I mean core dump or 
at least exception if it is programmed defensively) if you don't initialize 
it at all.
...
I 
suppose the other alternative is the noddy realInit is only used once in an
action which is only used once, in an action .. from main (which is only
used once hopefully). Is this what you have in mind?
It's the same patt as above: If you do it your way, you have the problem with 
ensuring that it gets called at least once before you call routines that 
depend on it. And that gets *really* hard as soon as you have concurrent 
threads.

Maybe we should look for a solution that can enforce *both* invariants, "at 
least once" as well as "at most once"? Its only that I can't see such a 
solution and therefore my preferences would be to redesign 'realInit' in such 
a way that calling it twice is not fatal but just creates another 
'instance' (can't be more specific without knowing what the library does).
...
The behaviour of (and consequent constraints on correct useage of) 
realInit and putString are very different. Must I eloborate them?
True, I can't see any constraints on correct usage of 'putString' that aren't 
enforced by the type checker. And that is exactly how it should be.

Maybe the problem with your 'realInit' is that it needs such constraints? 
Again, giving an example might convince me that these constraints are 
inherent to the problem domain and can't be worked around.
...
...
...
It doesn't seem very attractive to users either
(considerably complicates their code and places the burden on them to
"get it right").
It may seem so at first, but I think it's a delusion.
Trust me on this, for whatever reason, it's absolutely vital that realInit
is used 0 or 1 times only, 2 or more is a catastrophic error.
I would very much like to trust you, but why can't you give us an example? Are 
you talking about misssion-critical stuff like controlling an airplane? But 
you don't initialize a library in full flight, do you?

So why is it catastrophic and what exactly does that mean? I thought you mean 
core cump, but I am no longer sure...

Maybe the reason is that it calls out to C libraries with a broken API? (I 
know of enough such libraries, and interfacing them in a clean manner is 
sometimes a pain in the ass.)
...
So I'll ask again. Please provide a simpler _and_ safer alternative
(some real Haskell code please).
And I'll ask again for an example to convince me of the necessity.
...
...
At the moment I cannot imagine a well designed library interface where
user code would be considerably complicated if no global variables were
used. But maybe you have a good example at hand to prove that this is
merely due to lack of imagination on my side, and that I was extremely
lucky with the HWS? ;-)
Indeed, I believe this is the case. I'm guessing of course, but I imagine
all your IO is done via standard Haskell library calls (socket API
or whatever), in which case they will hide a lot of the stateful compexity
of their implementation already.
I don't know about the latter. I do know that there are no constraints on 
usage in the form of "this must be called before that", besides the ones 
automatically enforced by the type system. An exception might be the posix 
libraries, but they are only a thin layer over a badly designed C API. I 
could be wrong, but I doubt that there is lots of hidden state in the Haskell 
part.

I once wrote a Haskell binding to a C library for a special network protocol. 
I never even considered using unsafePerformIO except for C routines that were 
actually pure functions. What I *did* need to consider and work around was 
that the C API was in some places hiding global state, which was *very* bad.

Another example: Have you ever been using ONC/RPC (Remote Procedure Call)? I 
saw implementations that came with a real-time multithreaded OS where the 
docs said, more or less: "All created objects such as client handle may only 
be used from the thread that created them." *That* is a horrible API, because 
it means you can not pass these objects around freely but have to make sure 
your routine isn't called from the "wrong" thread! And the reason for this 
restriction was (of course) that the library was hiding state inside 
thread-local variables.
...
If so it seems to me you're using the fact 
that somebody has already solved the problem for you as an argument that
no solution is necessary.
Maybe. We can talk in "if" sentences until we both die of old age.
...
(It would be interesting to see what the api's 
of the libraries you're using would look like, if they had been designed
according to the principles you're advocating).
Yes, that would be interesting. And it is not a matter of me holding up holy 
principles against an evil reality. I am talking about practical 
considerations, not ideals. I hope I've made that clear with the above 
examples.

Cheers,
Ben