More ideas for controlled mutation

Laziness can be viewed as a form of controlled mutation, where we overwrite a thunk with its actual value, thus only running the code once and reaping great time benefits. I've been toying around with some ideas where we do alternative forms of controlled mutation. One such idea has to do with memoization. One way we memoize functions is by building up a data-structure that covers the entirety of input domain of the function, with the values in each slot being thunks for the corresponding function call. Now, this is all very nice and well, but for some types of inputs (like machine integers) we end up making a very big structure for a function for which, in practice, we'll only store and retrieve a few values of the domain. Hash tables take advantage of this fact by simply chaining together values in a linked list if they land in the same bucket. Could we have similarly bucketized memoization? What we want here is for a *thunk to possibly evaluate to different values, but calls to the API be observationally equivalent.* That is, if the only way I can inspect a dictionary list is do a lookup, I don't care if my representation is [(1,4),(2,2)] or [(2,2),(1,4)]. An obvious way to do this is to use unsafePerformIO to read out an IORef stating the value currently being looked up, and have the thunk evaluate to the pair of that key and the result. There are some synchronization concerns, of course: ideally we would only take out a lock on the thunk once we realize that the value doesn't already exist in the memotable, but I don't think there's a way in GHC Haskell to observe if a value is a thunk or not (maybe such a mechanism would be useful?) This seems related to lazy IO, where thunks are co-opted into performing input-output effects. After all, mutation is just a controlled form of IO. Is lazy IO evil or not? One consensus is that for operations that involve limited resources (file descriptors, etc.) lazy IO is too opaque for its own good. It seems we also haven't cracked it's performance problems either (for the relatively un-objectionable generation of an infinite stream of random numbers, Don Stewart notes: "There are real overheads here. Consider eagerly filling chunks and extracting elements piecewise."). But code that looks pure and can be reasoned about like pure code, but has the efficiency benefits of mutation is a very attractive thing indeed. I think it's worth some thought, though the programming world at large has a hard enough time reasoning about laziness even when everything is completely pure! Cheers, Edward P.S. An obvious question is whether or not we could use this technique to implement splay heaps or hash tables with pure interfaces. My opinion is no, because these structures demand you be able to *observe* the mutation.

On Sun, 24 Apr 2011, Edward Z. Yang wrote:
Laziness can be viewed as a form of controlled mutation, where we overwrite a thunk with its actual value, thus only running the code once and reaping great time benefits.
Reminds me on a discussion about 'blue printing' in the past: http://www.haskell.org/haskellwiki/Blueprint

"Edward Z. Yang"
I've been toying around with some ideas where we do alternative forms of controlled mutation. One such idea has to do with memoization. [..] Hash tables take advantage of this fact by simply chaining together values in a linked list if they land in the same bucket. [...] An obvious way to do this is to use unsafePerformIO to read out an IORef stating the value currently being looked up, and have the thunk evaluate to the pair of that key and the result. There are some synchronization concerns, of course:
Seen this? http://augustss.blogspot.com/2011/04/ugly-memoization-heres-problem-that-i.h... -k -- If I haven't seen further, it is by standing in the footprints of giants

Yep. It harkens to my days of forcing impure, non-thread-safe C libraries into nice, pure, Haskell FFI bindings. I suppose what I'd like to do here is work in the unsafe IO much more closely with GHC's existing facilities, so that we spend as much time as possible /not/ in unsafePerformIO. A kind of hybrid approach, if you will. P.S. Don Stewart points out that Edward Kmett has can access GHC's pointer tags http://hackage.haskell.org/package/tag-bits, thus allowing us to approximate evaluated/not evaluated. Maybe I'll hack up a prototype next time round. Excerpts from Ketil Malde's message of Sun Apr 24 17:41:23 -0400 2011:
"Edward Z. Yang"
writes: I've been toying around with some ideas where we do alternative forms of controlled mutation. One such idea has to do with memoization. [..] Hash tables take advantage of this fact by simply chaining together values in a linked list if they land in the same bucket. [...] An obvious way to do this is to use unsafePerformIO to read out an IORef stating the value currently being looked up, and have the thunk evaluate to the pair of that key and the result. There are some synchronization concerns, of course:
Seen this?
http://augustss.blogspot.com/2011/04/ugly-memoization-heres-problem-that-i.h...
-k

Edward Z. Yang wrote:
Laziness can be viewed as a form of controlled mutation, where we overwrite a thunk with its actual value, thus only running the code once and reaping great time benefits.
[..]
Hash tables take advantage of this fact by simply chaining together values in a linked list if they land in the same bucket. Could we have similarly bucketized memoization? What we want here is for a *thunk to possibly evaluate to different values, but calls to the API be observationally equivalent.* That is, if the only way I can inspect a dictionary list is do a lookup, I don't care if my representation is [(1,4),(2,2)] or [(2,2),(1,4)]. An obvious way to do this is to use unsafePerformIO to read out an IORef stating the value currently being looked up, and have the thunk evaluate to the pair of that key and the result. There are some synchronization concerns, of course: ideally we would only take out a lock on the thunk once we realize that the value doesn't already exist in the memotable, but I don't think there's a way in GHC Haskell to observe if a value is a thunk or not (maybe such a mechanism would be useful?)
The thing is that lazy evaluation is referentially transparent while "I don't care about [(1,4),(2,2)] vs [(2,2),(1,4)]" is not. In the latter case, you have a proof obligation to the compiler that your API does not expose the difference between these two values. But in Haskell, you have no way of convincing the compiler that you fulfilled that proof obligation! (At least, I don't see any obvious one. Maybe a clever abuse of parametricity helps.) It might be an option in Agda, though. In that light, it is entirely reasonable that you have to use unsafePerformIO . Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Excerpts from Heinrich Apfelmus's message of Mon Apr 25 04:01:03 -0400 2011:
The thing is that lazy evaluation is referentially transparent while "I don't care about [(1,4),(2,2)] vs [(2,2),(1,4)]" is not.
Perhaps more precisely, laziness's memoization properties rely on the referential transparency of thunk evaluation.
In the latter case, you have a proof obligation to the compiler that your API does not expose the difference between these two values. But in Haskell, you have no way of convincing the compiler that you fulfilled that proof obligation! (At least, I don't see any obvious one. Maybe a clever abuse of parametricity helps.) It might be an option in Agda, though.
In that light, it is entirely reasonable that you have to use unsafePerformIO .
Yes, of course. But as works like 'amb' demonstrate, we can build higher-level APIs that are unsafe under the hood, but when used as abstractions fulfill referential transparency (or, in the case of 'amb', fulfill referential transparency as long as some not-as-onerous properties are achieved.) So if you implement a reusable mechanism that does unsafe stuff under the hood, but provides all the right guarantees as long as you don't peek inside, I think that's a good step. Cheers, Edward
participants (4)
-
Edward Z. Yang
-
Heinrich Apfelmus
-
Henning Thielemann
-
Ketil Malde