
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Adrian Hey wrote:
They are necessary because they are the only way to ensure important safety properties of many IO APIs.
That's a bold claim. It's very hard to prove that things don't exist. (That is, that other ways to ensure these safety properties don't exist). In snipped text you comment that the problems are often in low-level FFI library code: this makes me wonder if the real culprit doesn't lie at the FFI-haskell boundary. Perhaps there are good ways to specify this kind of invariant there.
No. Even if we stripped away all other code apart from the Haskell rts itself (OS, device drivers etc) and performed your IO entirely in Haskell (just using peek and poke on bare hardware), you'd still need top level mutable state to implement common IO API's (e.g. The socket API, does anybody really believe this is entirely stateless?).
I wouldn't dispute the assertion that at the level of complete programs or processes, implementations that don't use "global variables" are possible. But this does not hold at the level of individual IO library API's. If we want to keep our software *modular* (I take we do), then we need top level mutable state.
If you want to see more examples of the use of the unsafePerformIO hack you need only look at the source code of the current base package (you'll find a dozen or so uses of this hack to create top level mutable state).
All of these are, in a sense, failings. Because unsafePerformIO is not haskell, and we'd like base to be a haskell library. Not a GHC library.
But what's the problem? Is it the use of "global variables"? Or is it the use of the unsafePerformIO hack to create them?
It's only slightly the "unsafePerformIO hack", IMHO - if that were all, a mechanism not requiring it would have been implemented long ago. Now I launch into a long discussion: The difficult question is "how global?". GHCi already has problems with this (varying persistence of those global variables, and they never last between separate invocations of ghci). Obviously "global" variables to truly be global should be shared with one persistent state across the whole wide world forever :P - but then we get identity problems, e.g. fancy-package:GlobalVariable.Fancy.nuclearMissilesLaunched :: IORef/MVar MannerOfNuclearMissileLaunch where one day or on one hacker's computer there is type MannerOfNuclearMissileLaunch = Int --number launched already and another, data MannerOfNuclearMissileLaunch = NoMissilesLaunched | WorldDestroyed | Unknown The usual meaning relies on the size of a program invocation. This is a link to main:Main.main . As you observe, this is like inserting a wrapper over everywhere the IO monad is used. Clearly this adds modularity by not requiring main's code to be modified, and also destroys modularity by forcing main's semantics to be modified. A Haskell program is notionally executed by running Main.main. Consider:
global "foo" (initial value: False) main1 = setGlobal "foo" True main2 = getGlobal "foo" >>= print
Compile with -main-is main1 to the binary 'main1' and with -main-is main2 to the binary 'main2'. Now consider two possible overall definitions of main:
main = main1 >> main2 or main = executeBinary "main1" >> executeBinary "main2"
Basically, all existing operating systems require executing a binary to be more than just running its IO monad; they set up a "global" (process-specific) environment for the process, which is why the two hypothetical example defintions of main give different results. Note that operating systems also serve the root of filesystems "/" in unix, and variables global to the root filesystem's sharedness can be simulated in this way. Operating systems could serve process-specific spaces this way, as long as it is possible for them to define something like getProcessID :: IO ProcessID. Note that Haskell has ThreadID which is usefully Eq-comparable in GHC, whereas Hugs chooses not to distinguish the identity of threads. It is a similar design tradeoff. Hardware: readHardware, writeHardware are IO specific to the hardware. Kernels generally rely on storing information in RAM about the state of the hardware, and they presume to have global variables whose scope is the present run of the computer. This is straightforward for monolithic kernel designs. Although, if you want persistent settings it is more difficult, the system explicitly saving ALSA state to disk or whatever. Operating systems: I don't know sockets in particular, but indeed operating systems are expected to provide some IO operations that don't do exactly the same thing depending on which computer in the world the program is running on. Kernel/OS design variation is certainly one area to look into for further consideration of these issues. There are non-monolithic kernels (GNU Hurd...), systems that can run on multiple hardware-computers as a cluster... There is usually expected to be one name resolver (e.g. kernel) that all code can call into, even if it is only to access the current filesystem server or whatever - or else at program startup all those references are passed, like ELF dynamically linked binaries? It can certainly be done without extensions to the Haskell syntax or semantics, implementing every program in haskell and using no "top level mutable state" of the usual variety (although a way to store arbitrary Haskell values such as (Int -> Bool) in available shared spaces would need to be implemented). Consider the recent Haskell library that internally only uses unsafePerformIO to provide an otherwise safe extensible global named state (relying on Haskell's/Typeable's definition of type-equivalence for soundness). Well, that one unsafePerformIO would not be needed if the OS provided system calls to access a named area of memory (with initializer as given if it's not already existent - creation and detecting whether it exists yet could be separate system calls). Names would be the Haskell/Typeable system's unique identifiers such as the "fancy-package:GlobalVariable.Fancy.nuclearMissilesLaunched". It of course still suffers from the "how global is it? - no way for the caller to specify" problem as above. A full identification of the global variable might also involve the computer's IP address and the process's ID, for example. imagine that we essentially have: data AllGlobalVariables = ??? --contains some IORefs main :: AllGlobalVariables -> IO () --as it is with unsafePerformIO hack, C-style global variables, etc. --AllGlobalVariables is an abstract data type which the caller can --only create with the default mechanism (possibly after creation -- it can be passed to more than one invocation of main, as GHCi -- attempts to do), but it is extensible internally. or main :: SocketVariables -> TypeableVariables -> ... (-> OtherGlobalVariables) ... -> IO () --The caller can thread more precisely if desired. C programs would --definitely have an OtherGlobalVariables argument among them, which -- is based on some section of the binary file if I am not mistaken. This could of course be done somewhat abstractly with accessors, so that everything doesn't break when a new "global" thing is added. I might prefer the arguments to be grouped/distinguished by distinctions in scope definitions though (hardware-computer, "process" abstraction, etc.) It is more difficult when the scopes are not fixed for the complete run of a Haskell program (if it is suspended and resumed on a different computer... if the program was a kernel it would almost certainly not be happy about that!) So if a modular OS/kernel design, say, for sockets, required some certain globalness of variable, its main would have to be passed the ability to create such an area (meaning that a library/system call would have to be provided. Providing more system calls or exported library functions doesn't actually break everything, as existing systems prove!). I think this design can be done modularly (at least hypothetically, with the right system design). Many (Unix-like, some others) systems are stuck with the process as a unit of abstraction from which it's difficult to separate out anything particular (but e.g. Linux implements such ways anyway since they can be important). This is a design choice that simplifies some things (the unix philosophy; worse is better; and we do _have_ systems today) and makes it very difficult to do some other odd things, since much software (potentially) relies on quirks of that model. Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGTLLiHgcxvIWYTTURApvzAJ4qDQodQ1iKgb1eESIQMLV0qnDB6ACfWEyH nNSxFMN8PJOs8uxsXScrheg= =mSJZ -----END PGP SIGNATURE-----