Top-level state debate on the wiki

older
memory leak in Random.randoms or...

Ben Rudiak-Gould

1 Dec 2004 1 Dec '04

6:29 p.m.

I put up a wiki page summarizing the main proposals for top-level mutable state. The type-dictionary approach isn't there yet, but there's a space for it; I'll probably fill it in within the next 24 hours unless someone else feels like doing it first. Please add more detail, objections, examples. Especially examples! Here's the page: http://haskell.org/hawiki/GlobalMutableState (It also includes an explanation of why I chose that title, though of course you're welcome to dispute that too.) -- Ben

Show replies by date

Adrian Hey

1 Dec 1 Dec

6:59 p.m.

On Wednesday 01 Dec 2004 6:29 pm, Ben Rudiak-Gould wrote:

...

I put up a wiki page summarizing the main proposals for top-level mutable state. The type-dictionary approach isn't there yet, but there's a space for it; I'll probably fill it in within the next 24 hours unless someone else feels like doing it first.

Please add more detail, objections, examples. Especially examples!

Here's the page:

http://haskell.org/hawiki/GlobalMutableState

(It also includes an explanation of why I chose that title, though of course you're welcome to dispute that too.)

Thankyou for this. Good summary so far I think. I wish you hadn't called it GlobalMutableState though :) I prefer the term "top level" myself. Regards -- Adrian Hey

Jules Bean

9 p.m.

On 1 Dec 2004, at 18:29, Ben Rudiak-Gould wrote:

...

Here's the page:

http://haskell.org/hawiki/GlobalMutableState

Nice summary. What I think is missing is an explanation of when you would want this feature (and when you wouldn't, more importantly). Here is the kind of platonic dialogue that summarises my limited understanding: A: I want a global variable x which IO actions f and g can alter. Why can't I say x <- newIORef 0::Int at the top level? B: Why would you? Put the newIORef command at the beginning of your main action and pass the reference to f and g. This is much better programming practice: you can invoke f and g again on different references whenever you want to, useful for debugging and testing. A: Ah, but this isn't scaleable, is it? What if I have y and z and q and r and..... I really don't want f and g to have *that* many parameters! B: That's easy, just wrap them up in a single type. Call it GEnv, if you like, for Global Environment. Now all your IO actions just need one extra parameter. A: But the bodies of f and g call many other actions, most of which need access to some part of this environment. It's a real pain threading the global environment parameter to all of these actions! It would be so much simpler to just make them top-level globals! B: Problem threading parameters? Sounds to me like you want a state monad. You know you can use StateT GEnv IO to layer that over IO, right? In fact, in many programs you will not even need IORefs any more, since the StateT gives you modifiability for free. [A wanders off muttering...] [...some time later...] A: Ah! I've got you now! I'm implementing a module which exports a Map interface. It's a pure interface, no IO monads in sight. But, I want to implement it with constant-time lookup, so I need a hashing algorithm and a mutable hashtable, right? B: No problem, you should be using the ST monad and STUArray, probably. ....which is to say, that I don't have a very clear idea of when you do need it, although I think I have some ideas of when you don't need it. The example that I haven't addressed is the 'OS-in-haskell' family of examples, which I don't understand clearly enough to summarise. Jules

Keith Wansbrough

2 Dec 2 Dec

11:04 a.m.

...

Nice summary.

What I think is missing is an explanation of when you would want this feature (and when you wouldn't, more importantly). Here is the kind of platonic dialogue that summarises my limited understanding: [..dialogue snipped..]

This is good, and is the sort of thing that should go on the Wiki, for people to edit and discuss and take issue with. Feel free to add it to the existing page, or create a new one and link to it. The Wiki belongs to all Haskellers! --KW 8-)

Keean Schupke

9:08 a.m.

Ben Rudiak-Gould wrote: Just a small comment on the Wiki page... it says "Several real-life examples of pure haskell code which needs fast global variables to either be implemented efficiently or statically guarantee their invariants are given in http://www.haskell.org//pipermail/haskell/2004-November/014929.html" The first example is that of randomIO - not implementable in Haskell, however the function, randoms :: RandomGen g => g -> [a], is (and is probably more idomatic haskell anyway). The second example "Unique", can be implemented: getUniqueSupply = do a <- newIORef 0 return (nextUnique a) where nextUnqiue n = do x <- readIORef n writeIORef n (x+1) return x Which should be just as fast as the global version, and all you do is pass the 'unique' supply around... you can even generate a lazy list of unqiues which can be used outside the IO monad. Again the "disadvantage" is that you can have multiple unique supplies and you could use the "wrong" one... (which is an advantage in my opinion, as it increases flexibility and reuse of the code). The same applies to the AtomHash, it can be implemented just as effieciently without globals... The only difference appears to be the supposed ability of globals stopping the programmer using an alternate Hash... but of course there is nothing stopping the programmer using the wrong global at all! (In other words it seems just as easy to access the wrong top-level name as to pass the wrong parameter). Keean.

John Meacham

10:27 a.m.

On Thu, Dec 02, 2004 at 09:08:21AM +0000, Keean Schupke wrote:

...

Ben Rudiak-Gould wrote:

Just a small comment on the Wiki page... it says

"Several real-life examples of pure haskell code which needs fast global variables to either be implemented efficiently or statically guarantee their invariants are given in http://www.haskell.org//pipermail/haskell/2004-November/014929.html"

The first example is that of randomIO - not implementable in Haskell, however the function, randoms :: RandomGen g => g -> [a], is (and is probably more idomatic haskell anyway).

Yes. There are lots of ways to do things without global variables, that was never in doubt. However randomIO is a part of the haskell standard. Why is it not (efficiently) implementable in haskell? There is no particular reason it should not be. it should optimize to exactly about 5 instructions to run the linear congruence algorithm on a static location in memory.

...

The second example "Unique", can be implemented:

getUniqueSupply = do a <- newIORef 0 return (nextUnique a) where

nextUnqiue n = do x <- readIORef n writeIORef n (x+1) return x

Which should be just as fast as the global version, and all you do is pass the 'unique' supply around... you can even generate a lazy list of unqiues which can be used outside the IO monad. Again the "disadvantage" is that you can have multiple unique supplies and you could use the "wrong" one... (which is an advantage in my opinion, as it increases flexibility and reuse of the code).

Yes, this would be as fast as the global version*, but it implements something else. The entire point of Data.Unique is that one can consider the unique supply as part of the world, just like you consider the filesystem, the screen, the network, various OS routines, etc as part of the world. This should be implementable efficiently, after all, you can store the counter in a file in /tmp, or just create a stub C file to do it, so it is obviously not a bad thing to allow, it is already allowed, it just needs to be able to done efficiently or people will resort to unsafe hacks like unsafePerformIO which is a serious impediment to aggressive compiler optimizations and a plauge on the mathematical semantics of the intermediate language.

...

The same applies to the AtomHash, it can be implemented just as effieciently without globals... The only difference appears to be the supposed ability of globals stopping the programmer using an alternate Hash... but of course there is nothing stopping the programmer using the wrong global at all! (In other words it seems just as easy to access the wrong top-level name as to pass the wrong parameter).

No, because then it would not typecheck. the whole point of Atom.hs is that the only way to generate values of type 'Atom' is to go through the single unique hash table. Hence the static guarentee that there is always an isomorphism between everything of type 'Atom' and everything of type 'String' in the system. This is only made possible by the modules ability to hide access to routines which could be used to break the invarient (such as the raw global hash). This is obviously a very important invarient! Let us please not confuse the many philosophical issues against global variables in design which I wholeheartily agree with, with what the global variables proposal is meant to achieve. It is for use at the very lowest level of the libraries. i.e. not to be used by the average person. They are for Atom tables, memoization, anti-memoization, I have desires to move some of the runtime stable/weak pointer infrastructure out of being magic implemented by the runtime, to being implemented in haskell itself, this requires the global hash of stablepointers to be implementable directly. Ghc itself is getting rid of global variables AS SEEN BY THE PROGRAMMER but many libraries still NEED them inside to do their clever memoization tricks and fast strings which are required to make ghc usable at all. Really, you should not be opposed to them unless you are also opposed to the FFI. At some level, deep inside the libraries, this functionality is needed, just like the FFI. it is even needed to implement the type indexed execution context proposals. Exposing the fact there is global state will still be a bad idea, their usage will be hidden by pure interfaces by good programers, just like unsafePerformIO or uses of the ST monad are done now. A module which provides observable global state, but does not let you parameterize over it is bad form. For example randomIO has implicit global state, but you can use the parameterized versions such as randoms. unlike Random Atom.hs DOES NOT HAVE IMPLICIT GLOBAL STATE. A perfectly acceptable implementation would be toAtom = fromAtom = id. This is why Atom does not need to be parameterized over its global state. the fact it is used is completly abstracted away because it is an implementation detail. There is a real _fundamental difference_ here, please try to understand it before rebutting. I really dislike this argument because I find myself having to vocally disagree with things I actually fully agree with in their proper context :). They are needed to support the various tricks necessary to create the standard haskell libraries and large programs that need to do real work and are pushing the system to the limits like ghc, ginsu and darcs. At its base level, top level initializers are about providing a very low level tool which provides a generic and semantically correct way to implement necessary performance improving measures and higher level libraries such as George Russel's which provide a much nicer and higher level interface for general users to use. John * actually, under the mdo proposal, Data.Unique can even be implemented faster with some easy compiler transformations to lift static heap allocated cells to the bss. but this is not important for the discussion however I would be interested in implementing them if the mdo proposal is ever implemented. -- John Meacham - ⑆repetae.net⑆john⑈

Keean Schupke

10:53 a.m.

Hi John, I am not objecting to the top-level TWIs anymore - since I realised contexts can be provided by wrapping the MVar or IORef modules. I just thought the wiki misrepresented the calims of your examples (or maybe the claims are a little exaggerated)... As far as I can tell adding top level TWIs will change nothing as they provide no guarantees of uniqueness. As nothing changes (exept you don't have to pass them around) - I have nothing to object to in this proposal ... although option 2b from the wiki would be my favourate. John Meacham wrote:

...

Yes. There are lots of ways to do things without global variables, that was never in doubt. However randomIO is a part of the haskell standard. Why is it not (efficiently) implementable in haskell? There is no particular reason it should not be. it should optimize to exactly about 5 instructions to run the linear congruence algorithm on a static location in memory.

The comment was really about the 'introductory' line in the wiki, which seemed to me to be stating there are efficiecy reasons for using global variables (false, as the examples I gave show) and that they provide some static guarantees (false, as I can replace the MVar library and break the unique property - so it is not a static guarantee - It just makes it a little more convoluted to get around)... As for randomIO not being implementable in Haskell, this is true, but it is no more efficient than passing a random sequence generator: getRandomSource = do a <- newIORef 0 return (nextRandom a) where nextRandom n = do -- where g and f are the generator functions x <- readIORef n writeIORef n (g x) return (f x)

...

Yes, this would be as fast as the global version*, but it implements

something else. The entire point of Data.Unique is that one can consider the unique supply as part of the world, just like you consider the filesystem, the screen, the network, various OS routines, etc as part of the world.

Yes, but not necessarily unique. I may have more than one keyboard... Infact any assumption that a resource is unique is normally a bad one - for example windows only supporting one display - they probably had to rewrite a lot of code using globals when they wanted to support multi-headed machines.

...

This should be implementable efficiently, after all, you can store the counter in a file in /tmp, or just create a stub C file to do it, so it is obviously not a bad thing to allow, it is already allowed, it just needs to be able to done efficiently or people will resort to unsafe hacks like unsafePerformIO which is a serious impediment to aggressive compiler optimizations and a plauge on the mathematical semantics of the intermediate language.

I agree here - I can always change the filesystem with a OS call (like chroot) and I can swap the top-level TWI context with a wrapper module around the MVar/IORef module.

...

No, because then it would not typecheck. the whole point of Atom.hs is

that the only way to generate values of type 'Atom' is to go through the single unique hash table. Hence the static guarentee that there is always an isomorphism between everything of type 'Atom' and everything of type 'String' in the system. This is only made possible by the modules ability to hide access to routines which could be used to break the invarient (such as the raw global hash). This is obviously a very important invarient!

But this can be broken with a wrapper module around IORef that lets me change contexts... so it is the same in reality, just it requires a little more thought to get round the "guarantee".

...

Let us please not confuse the many philosophical issues against global variables in design which I wholeheartily agree with, with what the global variables proposal is meant to achieve. It is for use at the very lowest level of the libraries. i.e. not to be used by the average person. They are for Atom tables, memoization, anti-memoization, I have desires to move some of the runtime stable/weak pointer infrastructure out of being magic implemented by the runtime, to being implemented in haskell itself, this requires the global hash of stablepointers to be implementable directly. Ghc itself is getting rid of global variables AS SEEN BY THE PROGRAMMER but many libraries still NEED them inside to do their clever memoization tricks and fast strings which are required to make ghc usable at all. Really, you should not be opposed to them unless you are also opposed to the FFI. At some level, deep inside the libraries, this functionality is needed, just like the FFI. it is even needed to implement the type indexed execution context proposals.

No as I said, my objection was to the summery line which claimed globals necessary for speed or static guarantees - both claims are false (and I don't think you claimed as such in the examples - so it is only in regards to the wiki entry)

...

Exposing the fact there is global state will still be a bad idea, their usage will be hidden by pure interfaces by good programers,

Of course you cannot stop bad programmers having access to the same 'tools' once they are in the language. Keean.

Keean Schupke

12:01 p.m.

With regards to the following... "There are cases in which this parameterization costs convenience and gains nothing -- for example, the standard library function (randomIO :: Random a => IO a) cannot be implemented in Haskell without the unsafePerformIO hack, yet there's nothing semantically objectionable about it. Worse, we may weaken type checking by the translation, while gaining nothing: for example, the interface below cannot provide static guarantees like a == b <=> fromAtom a == fromAtom b if toAtom is parameterized by an "atom space"." Can we not make this guarantee with multiple Atom-spaces if we use the same local universal quantification trick used on the ST monad? IE add a type parameter 's' to the type of Atom? Keean.

John Meacham

12:18 p.m.

On Thu, Dec 02, 2004 at 10:53:57AM +0000, Keean Schupke wrote:

...

Hi John,

I am not objecting to the top-level TWIs anymore - since I realised contexts can be provided by wrapping the MVar or IORef modules. I just thought the wiki misrepresented the calims of your examples (or maybe the claims are a little exaggerated)...

Yeah, I apologize, my comments wern't directed at you in particular, I was sort of just responding to the whole thread in batch mode.

...

...
Yes. There are lots of ways to do things without global variables, that was never in doubt. However randomIO is a part of the haskell standard. Why is it not (efficiently) implementable in haskell? There is no particular reason it should not be. it should optimize to exactly about 5 instructions to run the linear congruence algorithm on a static location in memory.

The comment was really about the 'introductory' line in the wiki, which seemed to me to be stating there are efficiecy reasons for using global variables (false, as the examples I gave show)

That example solves a different problem, I was never claiming that there wern't efficient ways to solve the unique producer problem in general, just no efficient way to provide the interface as given in Data.Unique, a top level declaration of type 'IO Int'.

...

(false, as I can replace the MVar library and break the unique property - so it is not a static guarantee - It just makes it a little more convoluted to get around)...

If you are allowed to replace code at will then nothing anywhere is guarenteed in haskell :) I mean you could replace head with head = error "No head for you." and a lot of libraries would break. It would be the responsibility of anyone replacing key system libraries to guarentee they are observationally equivalant to what their users expect. For users of Atom, you still have strong static guarentees at least as strong as any other guarentees in haskell. So, an implementation of MVar which randomly cloned state would not break the static guarentees of Atom as much as it would break the guarentees of MVar and hence anything that relys on it.

...

As for randomIO not being implementable in Haskell, this is true, but it is no more efficient than passing a random sequence generator:

getRandomSource = do a <- newIORef 0 return (nextRandom a) where

nextRandom n = do -- where g and f are the generator functions x <- readIORef n writeIORef n (g x) return (f x)

Yes. but that has the same problem as your other example. it is a different interface than the one we want. If we want to pass around the state we can use many of the other routines in Random. If we want to extend the world with the standard generator (which is perfectly reasonable, and useful enough to make it into the standard haskell libraries), we are out of luck.

...

...
Yes, this would be as fast as the global version*, but it implements

something else. The entire point of Data.Unique is that one can consider the unique supply as part of the world, just like you consider the filesystem, the screen, the network, various OS routines, etc as part of the world.

Yes, but not necessarily unique. I may have more than one keyboard... Infact any assumption that a resource is unique is normally a bad one - for example windows only supporting one display - they probably had to rewrite a lot of code using globals when they wanted to support multi-headed machines.

Hrm? This is unrelated, I never claimed anything about uniqueness, just that the filesystem, screen[s], keyboard[s], etc.. are part of the world. TWIs are to extending the world in a more efficient way than writing to a file and a more safe way than using unsafePerformIO. unique identitfiers are one example of a use of them (assuming you have appropriate library functions to allow this use).

...

...
This should be implementable efficiently, after all, you can store the counter in a file in /tmp, or just create a stub C file to do it, so it is obviously not a bad thing to allow, it is already allowed, it just needs to be able to done efficiently or people will resort to unsafe hacks like unsafePerformIO which is a serious impediment to aggressive compiler optimizations and a plauge on the mathematical semantics of the intermediate language.

I agree here - I can always change the filesystem with a OS call (like chroot) and I can swap the top-level TWI context with a wrapper module around the MVar/IORef module.

Yeah, I imagine this functionality would be useful to implement debuggers like 'hat' where you need to collect 'meta-information' about a run of the program. again, I don't see the ability to replace library calls like IORef as a weakness in the guarentees provided by stuff that uses them, as it is up to the replacer of the libraries to do the right thing.

...

...
No, because then it would not typecheck. the whole point of Atom.hs is

that the only way to generate values of type 'Atom' is to go through the single unique hash table. Hence the static guarentee that there is always an isomorphism between everything of type 'Atom' and everything of type 'String' in the system. This is only made possible by the modules ability to hide access to routines which could be used to break the invarient (such as the raw global hash). This is obviously a very important invarient!

But this can be broken with a wrapper module around IORef that lets me change contexts... so it is the same in reality, just it requires a little more thought to get round the "guarantee".

see above. if you can rewrite libraries then all bets are off. (not that it isn't sometimes useful to do so). The guarentee is still as strong as one can hope to achieve in haskell.

...

...
Let us please not confuse the many philosophical issues against global variables in design which I wholeheartily agree with, with what the global variables proposal is meant to achieve. It is for use at the very lowest level of the libraries. i.e. not to be used by the average person. They are for Atom tables, memoization, anti-memoization, I have desires to move some of the runtime stable/weak pointer infrastructure out of being magic implemented by the runtime, to being implemented in haskell itself, this requires the global hash of stablepointers to be implementable directly. Ghc itself is getting rid of global variables AS SEEN BY THE PROGRAMMER but many libraries still NEED them inside to do their clever memoization tricks and fast strings which are required to make ghc usable at all. Really, you should not be opposed to them unless you are also opposed to the FFI. At some level, deep inside the libraries, this functionality is needed, just like the FFI. it is even needed to implement the type indexed execution context proposals.

No as I said, my objection was to the summery line which claimed globals necessary for speed or static guarantees - both claims are false (and I don't think you claimed as such in the examples - so it is only in regards to the wiki entry)

I belive that effectively they are true, because in general, you can't worry about people rewriting libraries out from under you in incompatable because then you could never write any correct programs :) and the efficiency claims were all relative to certain specific APIs. changing the APIs is another possibility in some cases. Quite an important one. But it is not universally applicable or desired. Especially in the cases where you have internal global state but no implicit observational mutable state. (like in my Atom example)

...

...
Exposing the fact there is global state will still be a bad idea, their usage will be hidden by pure interfaces by good programers,

Of course you cannot stop bad programmers having access to the same 'tools' once they are in the language.

True. but as far as these tools go, TWIs are pretty darn safe :) RULES pragmas let you break pretty much any property of rewriting like making f . g /= f (g x) without even rewriting the original module! quite dangerous. and unsafePerformIO lets you break typesafety leading to a segfault. The worst TWIs allow is bad coding practice :) Fortunatly, I don't think Haskell collects many bad programmers. Some temporarily misguided ones perhaps :) John -- John Meacham - ⑆repetae.net⑆john⑈

Keean Schupke

12:42 p.m.

Just a few minor nitpicks... mainly about the necessity of using certain APIs, however I think we are in general agreement... Keean. John Meacham wrote:

...

That example solves a different problem, I was never claiming that there wern't efficient ways to solve the unique producer problem in general, just no efficient way to provide the interface as given in Data.Unique, a top level declaration of type 'IO Int'.

My argument would be program in the idomatic style of the language, rather than change the language. In other words it is not necessary to implement that API, it is a preference.

...

If you are allowed to replace code at will then nothing anywhere is guarenteed in haskell :) I mean you could replace head with head = error "No head for you." and a lot of libraries would break.

I am only talking about substitutimg modules... you can simply feed ghc a different search path with -i, and insert a wrapper around the module. My point was it is a possible way to take a third party library that supports only one global context and get it to support many contexts without changing a single line of code in the library itself, and this is a good thing IMHO.

...

Yes. but that has the same problem as your other example. it is a

different interface than the one we want. If we want to pass around the state we can use many of the other routines in Random. If we want to extend the world with the standard generator (which is perfectly reasonable, and useful enough to make it into the standard haskell libraries), we are out of luck.

But you can use this different interface - it may not be the one you would like - but it does the job. My point was it is not necessary to have a generator with the API you gave, again it is a preference.

...

Yeah, I imagine this functionality would be useful to implement debuggers like 'hat' where you need to collect 'meta-information' about a run of the program. again, I don't see the ability to replace library calls like IORef as a weakness in the guarentees provided by stuff that uses them, as it is up to the replacer of the libraries to do the right thing.

Yes, I think it is a sign of good modularity. Keean.

Ben Rudiak-Gould

12:47 p.m.

Keean Schupke wrote:

...

Ben Rudiak-Gould wrote:

...
[...]

Just a small comment on the Wiki page... it says

"Several real-life examples of pure haskell code which needs fast global variables to either be implemented efficiently or statically guarantee their invariants are given in http://www.haskell.org//pipermail/haskell/2004-November/014929.html"

I don't know if this post was meant specifically for me, but in any case I didn't write the sentence quoted above. Other people have already added material to my original wiki page, and I encourage you to do the same if you disagree with what's there right now. To everyone who's posted in this thread: I think you're misunderstanding what I meant by the phrase "on the wiki". :-) My hope was/is that this whole debate /moves/ to the wiki exclusively. My impression is that the mailing-list debate has made no progress for some time (weeks) and almost all of the traffic now consists of weekly or daily repetitions of the same old points and counterpoints. This is a sign that the time has come to move to a discussion format that doesn't require reiteration. -- Ben

7517

Age (days ago)

7518

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Adrian Hey
Ben Rudiak-Gould
John Meacham
Jules Bean
Keean Schupke
Keith Wansbrough