
i thought the discussion had actually progressed a little further than might be obvious from http://www.haskell.org/haskellwiki/Top_level_mutable_state here is my summary of what i thought was the state of the discussion, followed by a hopefully simpler proposal. first, i'd like to distinguish between two aspects of the problem: (a) shared identifiers (b) shared initialisation by (a), i mean identifiers shared between several functions, referring to the same item, such as 'System.IO.stdin' referring to _the_ standard input handle. by (b), i mean initialisation code that needs to be run once before a set of functions may be used, such as 'Network.Socket.withSocketsDo'. second, note that neither (a) nor (b) represents a problem for non-IO code: let-binding takes care of (a) and (b). the problem arises when (a) and (b) are combined with IO-based code, in particular, if the shared identifiers of (a) stand for IO-based items, such as IO-based initialisation in (b). the source of trouble lies in the interaction of let-bindings with IO-based code. we know how to share descriptions of IO actions: let a = putStr "hi ho" -- 1 in a >> a we also know how share the results of IO actions: do r <- getLine -- 2 return (r,r) in (1), the action description is shared, but the action itself is executed, possibly repeatedly, after substitution, in (2), the action is executed before substitution and before continuation (through monadic bind), and its result is shared. what we do not know is how to share IO actions themselves in a demand-driven way, ie how to describe an IO action that is executed at most once, only on demand, with shared result. i thought this had become clear through Adrian's oneShot examples, but it seems to be mentioned only as a sideline on the Top_level_mutable_state page. if we had the ability to specify shared IO actions, this would directly address the issue of (a) in the case of IO-based things, and that IO-based (a) could then be used to make IO-based (b) more convenient. aside: even IO-based (b) is not in itself impossible, it is just inconvenient and error-prone. a module could provide an initialisation action and require that to be executed before any of its other actions may be called. this is inconvenient, especially if initialisation generates results that need to be passed to several other actions, and it is error-prone, because Haskell does not directly support protocol types (guaranteeing that initialisation is always called before any of the other actions, and is only called once). we can address the 'before' typing by having all other actions take a parameter of a type that can only be produced by the initialisation code, but we cannot guarantee that initialisation will be called at most once, without relying on the user, or on a solution to (a). the latter brings us back to the sharing of IO actions. that is exactly what "the unsafePerformIO hack" tries to achieve: myGlobalVar :: IORef Int {-# NOINLINE myGlobalVar #-} myGlobalVar = unsafePerformIO (newIORef 17) we specify an IO action ('newIORef'), we specify a shared name for it ('myGlobalVar'), with a monomorphic type ('IORef Int'), we specify that we do not want the action description to be substituted before execution ('NOINLINE', usually also '-fno-cse'), to avoid the repeated execution shown in (1) above, and we do _not_ specify that the action should be executed before continuation, as it would be in (2) above, by _not_ using monadic bind. what the 'unsafePerformIO' does is to ensure that the action is executed before substitution. taken together, we get a by-need sharing of IO action execution similar to by-need sharing of expression evaluation. the aspects that make this a somewhat brittle and unsafe 'hack' are the use of a pragma to ensure semantics, the use of the 'unsafePerformIO' hook to extend the evaluator, the type that is no longer IO-based (it is 'IORef Int', not 'IO (IORef Int)'), and the implicit constraint to a monomorphic type. quite a complex interaction of features. it is great that these features allow us to experiment with possibilities not originally planned for in language or evaluator. but now that experimentation has settled down to a common pattern of using these extension hooks for a particular class of problems, it seems sensible to integrate that pattern into the language and evaluator proper, addressing the safety issues at the same time. which is what all this discussion has been about. among the approaches suggested, we have seen first-class modules (do the initialisation on import or export, then pass the result to the whole module, rather than to each function in it), top-level "mdo" (collect commuting IO actions, and execute them at a sensible point in time out-of-line), and type-based indexing (use types as shared identifiers). i like first-class modules, but they would be a rather substantial change to Haskell; i almost like the top-level 'mdo', but only if it can be made to behave exactly as a non-top-level 'mdo', something current proposals don't seem to achieve, because it would again require rather substantial changes; i think that type-based indexing is just another workaround, rather than a solution (in particular, types as globally shared identifiers have their own issues; cf the first-class labels proposal for Haskell': http://hackage.haskell.org/trac/haskell-prime/ticket/92 ). the most pragmatic approach seems to be the top-level 'mdo', but it would still be a substantial change in language and implementation, and it introduces a difference between top-level and non-top-level 'mdo' which would complicate the language. but if i'm not mistaken, none of that is necessary - the whole 'mdo' aspect was introduced only as an after-the-fact justification of a special syntax for addressing the shared-IO issue. and the complications arise because that special syntax really has a new semantics, and top-level bindings do not really behave like the 'let . = . ' and '<-' in an 'mdo', even if the former might be translated into the latter. ---------- proposal: introduce support for shared-on-demand IO actions if i try to remove all that 'mdo'-related justification, and focus on the shared-IO issue of the proposal, i end up with something like this (which is implied, but somewhat hidden in Top_level_mutable_state): -- library mkOnceIO :: IO a -> IO (IO a) mkOnceIO io = do mv <- newEmptyMVar demand <- newEmptyMVar forkIO (takeMVar demand >> io >>= putMVar mv) return (tryPutMVar demand () >> readMVar mv) -- usage, desugared {-# OPTIONS_GHC -fno-cse #-} {-# NOINLINE myGlobalVar #-} myGlobalVar :: IO (IORef Int) myGlobalVar = unsafePerformIO (mkOnceIO (newIORef 42)) main = do gv <- myGlobalVar doSomeThing gv and the syntax extension would be something like -- usage, sugared myGlobalVar <= newIORef 42 main = do gv <- myGlobalVar doSomeThing gv where '<=' compares to the existing '<-' and '=' as follows: '<=' vs '=': the binding is shared, but involves an IO action, and shares the IO action itself, not just its description '<=' vs '<-': the binding is monomorphic and involves the result of an IO action, but that action itself wrapped in IO, and is only executed on demand, at most once apart from the IO type, there is nothing global about this, '<=' could be used to specify oneShot actions whereever an IO action is defined. in particular, one could define initialisation actions in a module and import them: module MyIOBase(mkStdIn) where mkStdIn <= initialiseStdIn -- shared IO action, run at most once module MyIO where import MyIOBase(mkStdIn) getLine = do stdin <- mkStdIn hGetLine stdin module Main where import MyIO(getLine) main = getLine >>= ... since the oneShot actions are wrapped in IO, this extension does not allow IO action to be executed in pure code, and possible points of execution are clearly specified in the code, which is safer, but slightly more inconvenient than 'the unsafePerformIO hack', and simpler than the toplevel '<-' plus 'ACIO monad' proposal. since the proposal could be implemented by a local transformation, it should also be easier to realise. okay, now: are there any holes in this proposal?-) claus