Re: [Haskell-cafe] Re: [Haskell] Top Level <-

On Sun, 31 Aug 2008, Adrian Hey wrote:
Ganesh Sittampalam wrote:
On Sun, 31 Aug 2008, Adrian Hey wrote:
Thanks for taking the time to do this Dan. I think the safety requirement has been met, but I think it fails on the improved API. The main complaint would be what I see as loss of modularity, in that somehow what should be a small irrelevant detail of the implementation of some obscure module somewhere has propogated it's way all the way upto main.
That's the key point, as I see it - they aren't "irrelevant details of the implementation", they are requirements the implementation places on its context in order for that implementation to be correct. So they should be communicated appropriately.
Eh? Please illustrate your point with Data.Unique. What requirements does it place on it's context? (whatever that might mean :-)
It requires that its context initialises it precisely once. Data.Unique is actually a poor example, as it is actually fine to initialise it multiple times as long as the resulting Unique values aren't treated as coming from the same datatype. But equally it can be implemented with IORefs, so it's not a good advert for the need for global variables.
The real irony of your remark is that making APIs this robust is practically impossible *without* using global variables, and you're now saying that because they've done this work to eliminate these constraints they now have to be held to account for this with an absurd API.
I think there are two cases to consider here. A Data.Unique style library, which requires genuinely *internal* state, and which is agnostic to having multiple copies of itself loaded simultaneously. In that case, there is no requirement for a process-level scope for <-, just that each instance of the library is only initialised once - the RTS can do this, as can any dynamic loader. The other is some library that really cannot be safely loaded multiple times, because it depends on some lower-level shared resource. Such a library simply cannot be made safe without cooperation from the thing that controls that shared resource, because you cannot prevent a second copy of it being loaded by something you have no control over. If the <- proposal were only about supporting the first of these applications, I would have far fewer objections to it. But it would have nothing to do with process-level scope, then. Ganesh

Ganesh Sittampalam wrote:
On Sun, 31 Aug 2008, Adrian Hey wrote:
Eh? Please illustrate your point with Data.Unique. What requirements does it place on it's context? (whatever that might mean :-)
It requires that its context initialises it precisely once.
It's context being main? If so this is true, but I don't see why this is a problem. It's a happy accident with the unsafePerformIO hack as it is, and part of the defined semantics for *all* hypothetical top level <- bindings. Though to be more precise, the requirement is that it may be initialised at any time prior to first use, but never again (there's no requirement to initialise it at all if it isn't used). Also ACIO monad properties guarantee that it's always initialised to the same value regardless of when this occurs. So I don't see the problem.
Data.Unique is actually a poor example, as it is actually fine to initialise it multiple times as long as the resulting Unique values aren't treated as coming from the same datatype.
I just don't see what you're getting at. There's no problem here and Data.Unique is not special. We don't even have to consider whether or not it's OK to reinitialise these things unless the programmer explicitly allows this in the API (which Data.Unique doesn't). This is true for all top level <- bindings. myCount :: MVar Int myCount <- newMVar 0 In a hypothetical second initialisation, do you mean.. 1 - myCount somehow gets rebound to a different/new MVar 2 - The binding stays the same but MVar gets reset to 0 without this being explicitly done in the code. I assume you mean the latter (2). But either case seems like an absurdity to me. No top level bindings randomly change halfway through a program and MVars (I hope) are not prone to random corruption (no need to suppose things are any different if they occur at the top level).
But equally it can be implemented with IORefs,
Actually it couldn't as IORefs are not an Ord instance.
so it's not a good advert for the need for global variables.
Oh please! We have to have something concrete to discuss and this is the simplest. Like I said there are a dozen or so other examples in the base package last time I counted and plenty of people have found that other libs/ffi bindings need them for safety reasons. Or at least they need something that has "global" main/process scope and so far the unsafePerformIO hack is the only known way to get that and still keep APIs stable,sane and modular. Also, AFAICS going the way that seems to be suggested of having all this stuff reflected in the arguments/types of API is going to make it practically impossible to have platform independent APIs if all platform specific implementation detail has to be accounted for in this way.
The real irony of your remark is that making APIs this robust is practically impossible *without* using global variables, and you're now saying that because they've done this work to eliminate these constraints they now have to be held to account for this with an absurd API.
I think there are two cases to consider here.
A Data.Unique style library, which requires genuinely *internal* state, and which is agnostic to having multiple copies of itself loaded simultaneously. In that case, there is no requirement for a process-level scope for <-, just that each instance of the library is only initialised once - the RTS can do this, as can any dynamic loader.
The other is some library that really cannot be safely loaded multiple times, because it depends on some lower-level shared resource. Such a library simply cannot be made safe without cooperation from the thing that controls that shared resource, because you cannot prevent a second copy of it being loaded by something you have no control over.
If the <- proposal were only about supporting the first of these applications, I would have far fewer objections to it. But it would have nothing to do with process-level scope, then.
The <- proposal introduces no new problems that aren't already with us. It solves 1 problem in that at least there's no room for the compiler to get it wrong or for people do use "dangerous things" when using the unsafePerformIO hack. I think that is really the only problem that can be solved at the level of Haskell language definition. I also think we need to be careful about the use of the term "process". IMO when we say the "process" defined by main, we are talking about an abstract process that is essentially defined by Haskell and may have nothing in common with a "process" as defined by various OS's (assuming there's an OS involved at all). Perhaps we should try be more clear and say "Haskell process" or "OS process" as appropriate. In particular when we say an MVar or IORef has "global" process scope (whether or not it occurs at top level) we are talking about a Haskell process, not an OS process. The issues you raise seem to me to be more to do with correct implementaton on various platforms using various tools of varying degrees of brokeness. So I don't really know what problems might be encountered in practice. But whatever these problems might be I don't think they can be fixed at the level of Haskell language definition as the solutions are likley to be platform specific "hacks". But this problem is going to be with us whether or not top level <- bindings are implemented (If they're not implemented people will still be doing the same thing with the unsafePerformIO hack). Regards -- Adrian Hey

Adrian Hey wrote:
We have to have something concrete to discuss and this is the simplest. Like I said there are a dozen or so other examples in the base package last time I counted and plenty of people have found that other libs/ffi bindings need them for safety reasons. Or at least they need something that has "global" main/process scope and so far the unsafePerformIO hack is the only known way to get that and still keep APIs stable,sane and modular.
Actually all this use of the tainted and derogatory term "global variable" is causing me to be imprecise. All MVars/IORefs have "global" main/process scope whether or not they're bound to something at the top level. The purpose of the top level static binding is to prevent accidental or malicious "state spoofing" if it's important that the *same* IORef/MVar is always used for some purpose. Regards -- Adrian Hey

On Mon, 1 Sep 2008, Adrian Hey wrote:
Actually all this use of the tainted and derogatory term "global variable" is causing me to be imprecise. All MVars/IORefs have "global" main/process scope whether or not they're bound to something at the top level.
"Global variable" is exactly the right term to use, if we are following the terminology of other languages. We don't call the result of malloc/new etc a "global variable", unless it is assigned to something with top-level scope. Ganesh

On Mon, Sep 01, 2008 at 10:45:05PM +0100, Ganesh Sittampalam wrote:
Actually all this use of the tainted and derogatory term "global variable" is causing me to be imprecise. All MVars/IORefs have "global" main/process scope whether or not they're bound to something at the top level.
"Global variable" is exactly the right term to use, if we are following the terminology of other languages. We don't call the result of malloc/new etc a "global variable", unless it is assigned to something with top-level scope.
global variable is not a very precise term in other languages for various platforms too a lot of times. for instance, windows dll's have the ability to share individual variables across all loadings of said dll. (for better or worse.) Haskell certainly has more advanced scoping capabilities than other languages so we need a more refined terminology. I think 'IO scope' is the more precise term, as it implys the scope is that of the IO monad state. which may or may not correspond to some external 'process scope'. John -- John Meacham - ⑆repetae.net⑆john⑈

On Mon, 1 Sep 2008, John Meacham wrote:
On Mon, Sep 01, 2008 at 10:45:05PM +0100, Ganesh Sittampalam wrote:
Actually all this use of the tainted and derogatory term "global variable" is causing me to be imprecise. All MVars/IORefs have "global" main/process scope whether or not they're bound to something at the top level.
"Global variable" is exactly the right term to use, if we are following the terminology of other languages. We don't call the result of malloc/new etc a "global variable", unless it is assigned to something with top-level scope.
global variable is not a very precise term in other languages for various platforms too a lot of times. for instance, windows dll's have the ability to share individual variables across all loadings of said dll. (for better or worse.)
Interesting, is this just within a single process?
Haskell certainly has more advanced scoping capabilities than other languages so we need a more refined terminology. I think 'IO scope' is the more precise term, as it implys the scope is that of the IO monad state. which may or may not correspond to some external 'process scope'.
Hmm, to me that implies that if the IO monad stops and restarts, e.g. when a Haskell library is being called from an external library, then the scope stops and starts again (which I presume is not the intention of <- ?) But I don't really care that much about the name, if there is consensus on what to call it that doesn't cause ambiguities with OS processes etc. Cheers, Ganesh

On 2008 Sep 1, at 18:08, Ganesh Sittampalam wrote:
On Mon, 1 Sep 2008, John Meacham wrote:
On Mon, Sep 01, 2008 at 10:45:05PM +0100, Ganesh Sittampalam wrote:
Actually all this use of the tainted and derogatory term "global variable" is causing me to be imprecise. All MVars/IORefs have "global" main/process scope whether or not they're bound to something at the top level.
"Global variable" is exactly the right term to use, if we are following the terminology of other languages. We don't call the result of malloc/new etc a "global variable", unless it is assigned to something with top-level scope.
global variable is not a very precise term in other languages for various platforms too a lot of times. for instance, windows dll's have the ability to share individual variables across all loadings of said dll. (for better or worse.)
Interesting, is this just within a single process?
Last I checked, it was across processes; that is, every DLL has its own (optional) data segment which is private to the DLL but shared across all system-wide loaded instances of the DLL. This actually goes back to pre-NT Windows.
Haskell certainly has more advanced scoping capabilities than other languages so we need a more refined terminology. I think 'IO scope' is the more precise term, as it implys the scope is that of the IO monad state. which may or may not correspond to some external 'process scope'.
Hmm, to me that implies that if the IO monad stops and restarts, e.g. when a Haskell library is being called from an external library, then the scope stops and starts again (which I presume is not the intention of <- ?)
It tells me the flow of execution has temporarily exited the scope of the IO monad, but can return to it. The state is suspended, not exited. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Mon, 1 Sep 2008, Brandon S. Allbery KF8NH wrote:
On 2008 Sep 1, at 18:08, Ganesh Sittampalam wrote:
On Mon, 1 Sep 2008, John Meacham wrote:
for instance, windows dll's have the ability to share individual variables across all loadings of said dll. (for better or worse.)
Interesting, is this just within a single process?
Last I checked, it was across processes; that is, every DLL has its own (optional) data segment which is private to the DLL but shared across all system-wide loaded instances of the DLL. This actually goes back to pre-NT Windows.
Sounds like a recipe for fun :-)
Haskell certainly has more advanced scoping capabilities than other languages so we need a more refined terminology. I think 'IO scope' is the more precise term, as it implys the scope is that of the IO monad state. which may or may not correspond to some external 'process scope'.
Hmm, to me that implies that if the IO monad stops and restarts, e.g. when a Haskell library is being called from an external library, then the scope stops and starts again (which I presume is not the intention of <- ?)
It tells me the flow of execution has temporarily exited the scope of the IO monad, but can return to it. The state is suspended, not exited.
In that case we could equally call the things "library scope", as that's the only scope they're visible in unless exported. Anyway, as long as we're clear on what it means, the name doesn't really matter. Ganesh

On Mon, 1 Sep 2008, Adrian Hey wrote:
Ganesh Sittampalam wrote:
On Sun, 31 Aug 2008, Adrian Hey wrote:
Eh? Please illustrate your point with Data.Unique. What requirements does it place on it's context? (whatever that might mean :-)
It requires that its context initialises it precisely once.
It's context being main? If so this is true, but I don't see why this is a problem. [...] Also ACIO monad properties guarantee that it's always initialised to the same value regardless of when this occurs. So I don't see the problem.
You see this as a requirement that can be discharged by adding the ACIO concept; I see it as a requirement that should be communicated in the type. Another way of looking at it is that Data.Unique has associated with it some context in which Unique values are safely comparable. You want that context to always be the top-level/RTS scope, I would like the defining that context to be part of the API.
Data.Unique is actually a poor example, as it is actually fine to initialise it multiple times as long as the resulting Unique values aren't treated as coming from the same datatype.
I just don't see what you're getting at. There's no problem here and Data.Unique is not special.
See the conversation with Ashley - you can have multiple copies of Data.Unique loaded without problem, as long as the resulting Unique datatypes aren't comparable with each other.
myCount :: MVar Int myCount <- newMVar 0
In a hypothetical second initialisation, do you mean.. 1 - myCount somehow gets rebound to a different/new MVar
I mean this. Or, more precisely, that a *different* myCount gets bound to a different MVar.
But equally it can be implemented with IORefs,
Actually it couldn't as IORefs are not an Ord instance.
Well, perhaps one could be added (along with hashing). Or perhaps it's not really needed; I don't know as I've never used Data.Unique, and I doubt I ever would as when I need a name supply I also want human readable names, and I can't think of any other uses for it, though no doubt some exist.
so it's not a good advert for the need for global variables.
Oh please!
We have to have something concrete to discuss and this is the simplest. Like I said there are a dozen or so other examples in the base package last time I counted
Would you mind listing them? It might help provide some clarity to the discussion.
and plenty of people have found that other libs/ffi bindings need them for safety reasons. Or at least they need something that has "global" main/process scope and so far the unsafePerformIO hack is the only known way to get that and still keep APIs stable,sane and modular.
Again, some specific examples would help.
Also, AFAICS going the way that seems to be suggested of having all this stuff reflected in the arguments/types of API is going to make it practically impossible to have platform independent APIs if all platform specific implementation detail has to be accounted for in this way.
It can all be wrapped up in a single abstract context argument; the only platform "bleed" would be if one platform needed a context argument but others didn't.
I think there are two cases to consider here.
A Data.Unique style library, which requires genuinely *internal* state, and which is agnostic to having multiple copies of itself loaded simultaneously. In that case, there is no requirement for a process-level scope for <-, just that each instance of the library is only initialised once - the RTS can do this, as can any dynamic loader.
The other is some library that really cannot be safely loaded multiple times, because it depends on some lower-level shared resource. Such a library simply cannot be made safe without cooperation from the thing that controls that shared resource, because you cannot prevent a second copy of it being loaded by something you have no control over.
If the <- proposal were only about supporting the first of these applications, I would have far fewer objections to it. But it would have nothing to do with process-level scope, then.
The <- proposal introduces no new problems that aren't already with us. It solves 1 problem in that at least there's no room for the compiler to get it wrong or for people do use "dangerous things" when using the unsafePerformIO hack. I think that is really the only problem that can be solved at the level of Haskell language definition.
I just want to be clear that the second of the two categories above cannot be used to justify the proposal, as it does not make them safe.
I also think we need to be careful about the use of the term "process".
IMO when we say the "process" defined by main, we are talking about an abstract process that is essentially defined by Haskell and may have nothing in common with a "process" as defined by various OS's (assuming there's an OS involved at all). Perhaps we should try be more clear and say "Haskell process" or "OS process" as appropriate. In particular when we say an MVar or IORef has "global" process scope (whether or not it occurs at top level) we are talking about a Haskell process, not an OS process.
We could call it "Haskell RTS scope" if you like; the term "Haskell process" is meaningless to me. Top-level scope, as defined in the emails between Ashley and myself, is also ok.
The issues you raise seem to me to be more to do with correct implementaton on various platforms using various tools of varying degrees of brokeness.
No, it's about correct implementation with different models of dynamic loading. None of the possibilities I am envisaging are broken, per se, though they might well break the <- proposal.
But this problem is going to be with us whether or not top level <- bindings are implemented (If they're not implemented people will still be doing the same thing with the unsafePerformIO hack).
But then it will be easier to point to their libraries as being at fault when something goes wrong. Ganesh

Ganesh Sittampalam wrote:
You see this as a requirement that can be discharged by adding the ACIO concept; I see it as a requirement that should be communicated in the type.
Another way of looking at it is that Data.Unique has associated with it some context in which Unique values are safely comparable. You want that context to always be the top-level/RTS scope, I would like the defining that context to be part of the API.
But why pick on Data.Unique as special? Because I just happened to have pointed out it uses a "global variable"? If you didn't know this I suspect this issue just wouldn't be an issue at all. Why haven't you raised a ticket complaining about it's API having the "wrong" type sigs? :-) There's shed loads of information and semantic subtleties about pretty much any operation you care to think of in the IO monad that isn't communicated by it's type. All you know for sure is that it's weird, because if it wasn't it wouldn't be in the IO monad. So I think you're applying double standards.
We have to have something concrete to discuss and this is the simplest. Like I said there are a dozen or so other examples in the base package last time I counted
Would you mind listing them? It might help provide some clarity to the discussion.
Here's what you can't find in the libs distributed with ghc. Note this does not include all uses of unsafePerformIO. It only includes uses to make a "global variable". Control.Concurrent 1 Control.OldException 1 Data.HashTable 1 Data.Typeable 1 Data.Unique 1 GHC.Conc 8 GHC.Handle 3 System.Random 1 Language.Haskell.Syntax 1 System.Posix.Signals 2 System.Win32.Types 1 Network.BSD 1 System.Posix.User 1 Total: 23 In the ghc source you can find 16 uses of the GLOBAL_VAR macro (can't imagine what that does :-). I didn't even attempt to figure out how global variables might be the rts source. Anyone care to hazard a guess? You can find a few more in the extra libs.. Graphics.UI.GLUT.Menu 1 Graphics.UI.GLUT.Callbacks.Registration 3 Graphics.Rendering.OpenGL.GLU.ErrorsInternal 1 Total: 5 A few more: wxHaskell 6 c2hs 1 GTK2HS 1 SDL 0 !! However, I happen to know that SDL suffers from the initialisation issue and IIRC it needs at least 1 global to stop user using an unsafe (possibly segfault inducing) calling sequence. Anyway, that's all from me because I'm bored with this thread now. Regards -- Adrian hey

On Tue, 2 Sep 2008, Adrian Hey wrote:
Ganesh Sittampalam wrote:
You see this as a requirement that can be discharged by adding the ACIO concept; I see it as a requirement that should be communicated in the type.
Another way of looking at it is that Data.Unique has associated with it some context in which Unique values are safely comparable. You want that context to always be the top-level/RTS scope, I would like the defining that context to be part of the API.
But why pick on Data.Unique as special? Because I just happened to have pointed out it uses a "global variable"?
Only because I thought it was the running example.
If you didn't know this I suspect this issue just wouldn't be an issue at all. Why haven't you raised a ticket complaining about it's API having the "wrong" type sigs? :-)
Because I don't use it, and even if I did use it I would just live with the API it has.
There's shed loads of information and semantic subtleties about pretty much any operation you care to think of in the IO monad that isn't communicated by it's type. All you know for sure is that it's weird, because if it wasn't it wouldn't be in the IO monad.
It does actually claim a specification, namely that no two calls to newUnique return values that compare equal.
We have to have something concrete to discuss and this is the simplest. Like I said there are a dozen or so other examples in the base package last time I counted
Would you mind listing them? It might help provide some clarity to the discussion.
Here's what you can't find in the libs distributed with ghc. Note this does not include all uses of unsafePerformIO. It only includes uses to make a "global variable".
Thanks. It'd probably be a good addition to the wiki page on this topic for these to be catalogued in terms of why they are needed, though I'm (probably) not volunteering to do it :-) Ganesh

Adrian Hey wrote:
There's shed loads of information and semantic subtleties about pretty much any operation you care to think of in the IO monad that isn't communicated by it's type. All you know for sure is that it's weird, because if it wasn't it wouldn't be in the IO monad.
So I think you're applying double standards.
Not to throw any more fuel on the fire (if at all possible), but the reason behind this is that IO has become a sin bin for all the things that people either don't know how to deal with, or don't care enough to tease apart. There are many people who would like to break IO apart into separate segments for all the different fragments of the RealWorld that actually matter for a given purpose. To date it has not been entirely clear how best to do this and retain a workable language. The fact that this discussion is going on at all is, IMO, precisely because of the sin-bin nature of IO. People have things they want to have "global" or otherwise arbitrarily large scope, but the only notion of a globe in Haskell is the RealWorld. Hence they throw things into IO and then unsafePerformIO it to get it back out. There are three problems to this approach: (1) It's a hack and not guaranteed to work, nuff said. (2) The RealWorld is insufficiently described to ensure any semantics regarding *how* it holds onto the state requested of it. This problem manifests itself in the discussion of loading the same library multiple times, having multiple RTSes in a single OS process, etc. In those scenarios what exactly the "RealWorld" is and how the baton is passed among the different libraries/threads/processes/RTSes is not clearly specified. (3) The API language is insufficiently detailed to make demands on what the RealWorld holds. This problem manifests itself in the argument about whether libraries should be allowed to implicitly modify portions of the RealWorld, or whether this requirement should be made clear in the type signatures of the library. As I said in the thread on [Research language vs. professional language], I think coming up with a solution to this issue is still an open research problem, and one I think Haskell should be exploring. The ACIO monad has a number of nice properties and I think it should be broken out from IO even if top-level <- aren't added to the language. The ability to declare certain FFI calls as ACIO rather than IO is, I think, reason enough to pursue ACIO on its own. But, ACIO does not solve the dilemmas raised in #2 and #3. Top-level mutable state is only a symptom of the real problem that IO and the RealWorld are insufficiently described. Another example where unsafePerformIO is used often is when doing RTS introspection. Frequently, interrogating the RTS has ACIO-like properties in that we are only interested in the RTS if a particular thunk happens to get pulled on, and we're only interested at the time that the pulling occurs rather than in sequence to any other actions. The use of unsafePerformIO here seems fraught with all the same problems as top-level mutable state. It would be nice to break out an RTS monad (or an UnsafeGhcRTS monad, or what have you) in order to be more clear about the exact requirements of what's going on. But even if we break ACIO and UnsafeGhcRTS out from IO, the dilemmas remain. To a certain extent, the dilemmas will always remain because there will always be a frontier beyond which we don't know what's happening: the real world exists, afterall. However, there is still room to hope for a general approach to the problem. One potential is to follow on the coattails of _Data Types a la Carte_. Consider, for example, if the language provided a mechanism for users to generate RealWorld-like non-existent tokens. Now consider removing IO[1] and only using BS, where the thread-state parameter could be any (co)product of RealWorld-like tokens. We could then have an overloaded function to lift any (BS a) into a BS (a :+: b). There are some complications with DTalC's coproducts in practice. For example, (a :+: b) and (b :+: a) aren't considered the same type, as naturally they can't be due to the Inl/Inr tagging. A similar approach should be able to work however, since these tokens don't really exist at all. Of course, once we take things to that level we're already skirting around capability systems. Rather than using an ad-hoc approach like this, I think it would be better to work out a theory connecting capability systems to monad combinators, and then use that theory to smash the sin bin of IO. [1] Or leaving it in as a type alias for BS RealWorld. -- Live well, ~wren
participants (5)
-
Adrian Hey
-
Brandon S. Allbery KF8NH
-
Ganesh Sittampalam
-
John Meacham
-
wren ng thornton