
Oops, forgot to send this to the list... sorry, Sjoerd.
On Thu, Nov 11, 2010 at 11:54 AM, Sjoerd Visscher
You would lose many uses of equational reasoning in your programs. Have you every substituted 'x * 2' for the expression 'x + x' in one of your programs, or vice versa? You can no longer do that, because someone may be serializing the function you're writing, checking how it's implemented, and relying it.
Yes, but it would not break any existing code. It would only break code that knowingly did the wrong thing.
Or code that unknowingly depends transitively on code that does the wrong thing. In that regard it would be much like unsafePerformIO, and about as trustworthy. Better off just having any such "serialize" be safely in IO, and let people who want to live dangerously just use unsafePerformIO to get around it.
We already have a weak case of this, since (\x -> undefined x) can be distinguished from undefined using seq, but that can be hand-waved away by not worrying about bottoms so much. That isn't going to work for serialize.
Why not?
I'd venture that perhaps because seq only behaves differently when one possible outcome is _|_. An unsafe serialize could distinguish between two non-bottom values, which means the sketchy behavior could be free to wreak havoc in code that's not crashing. For instance, assuming serialize can be applied to functions of any type, it would probably be trivial to write a function (isExpr :: a -> Bool) that reports whether an arbitrary term is a primitive value or the result of some expression, which then lets you write a function with type (forall a. a -> a) that is NOT equivalent to id, which could then be passed freely into any other piece of code you like. That sounds like buckets of fun, doesn't it? - C.

On 11 November 2010 18:01, C. McCann
For instance, assuming serialize can be applied to functions of any type, it would probably be trivial to write a function (isExpr :: a -> Bool) that reports whether an arbitrary term is a primitive value or the result of some expression [SNIP]
Persistent functional languages usually give serialized values including closures a dynamic type. So can you write isExpr :: Dynamic -> Bool ? As Persistent Haskell and Clean (both pure functional languages) have already supported serializing closures / HOFs I'm not sure its really a such semantical can of worms as this thread suggests.

On Thu, Nov 11, 2010 at 1:57 PM, Stephen Tetley
On 11 November 2010 18:01, C. McCann
wrote: For instance, assuming serialize can be applied to functions of any type, it would probably be trivial to write a function (isExpr :: a -> Bool) that reports whether an arbitrary term is a primitive value or the result of some expression [SNIP]
Persistent functional languages usually give serialized values including closures a dynamic type. So can you write isExpr :: Dynamic -> Bool ?
But it's not the type of the serialized value that's at issue, it's the type of the serializable values. Anything that lets you convert an arbitrary closure into something with internals open to inspection will likely have dire consequences for parametricity and referential transparency. Remember, the problem isn't what you do with the serialized form itself, it's what you can learn via it about the original value it was serialized from. To retain sanity, either "types that can be serialized" must be marked explicitly (perhaps in the context, similar to having a Data.Typeable constraint) to indicate potential non-parametric shenanigans, or the result of serializing and inspecting a value must be quarantined off, such as with IO. Or some other mechanism, but those seem like the obvious choices. Having a full serialization function without some restriction along those lines would be like renaming unsafePerformIO to runIO, moving it to Control.Monad.IO, and telling people "hey, just don't misuse this and everything will be okay". - C.

On 11/11/10 12:07 PM, C. McCann wrote:
To retain sanity, either "types that can be serialized" must be marked explicitly (perhaps in the context, similar to having a Data.Typeable constraint) to indicate potential non-parametric shenanigans
You mean, like Data.Binary? Cheers, Greg

But I don't see that you don't need introspection at user level for persistence, a dynamic type will do, thus the internals aren't open to inspection. Whatever introspection is necessary can be handled by the runtime system as in Clean and Persistent Haskell. You could look at the internals of a pickle with a binary editor but that's perhaps cheating.
From my reading of the paper, Persistent Haskell was suitably referentially transparent:
"This paper describes the first-ever implementation of orthogonal persistence for a compiled purely functional language, based on an existing St Andrews persistent object store." The conclusion notes in passing that OCaml's persistence isn't referentially transparent. If the Haskell version wasn't, I'd expect a mea culpa from the authors at this point. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.421

Apologies - an unfortunate typo in my first sentence (extra "don't") , it should have read: :
But I don't see that you need introspection at user level for persistence, a dynamic type will do, thus the internals aren't open to inspection. Whatever introspection is necessary can be handled by the runtime system as in Clean and Persistent Haskell. You could look at the internals of a pickle with a binary editor but that's perhaps cheating.

On Thu, Nov 11, 2010 at 3:30 PM, Stephen Tetley
The conclusion notes in passing that OCaml's persistence isn't referentially transparent. If the Haskell version wasn't, I'd expect a mea culpa from the authors at this point.
From a quick glance at the paper, the Haskell version is referentially transparent in the standard, trivial sense: the persistence operations all return IO actions. This is of course perfectly fine. What started this thread, however, was the idea of a serialization function producing something like a pure ByteString, and why that, as opposed to (IO ByteString), would be extremely problematic.
What it boils down to is just that any pure "serialization" function would necessarily do nothing useful. Serializing closures from IO actions, on the other hand, I think is a great idea, though probably difficult to implement! - C.

On 11 November 2010 21:23, C. McCann
[Snip] What started this thread, however, was the idea of a serialization function producing something like a pure ByteString, and why that, as opposed to (IO ByteString), would be extremely problematic.
I think the original poster was intrigued by the possibilities serializing functions and their first guess a type signature was a MacGuffin[*]. Its a lot of work to implement persistence. As far as I know its only implemented for the Windows version of Clean. Napier 88's persistent store was a very substantial development effort for a programing language research project - multi-person, multi-year, EU funded through the ESPRIT Basic Research programme. [*] In case anyone looks up MacGuffin on Wikipedia, I don't think the description there is strictly accurate. A MacGuffin doesn't drive the plot so much as throw the viewer of the scent.

On Nov 11, 2010, at 1:42 PM, Stephen Tetley wrote:
[*] In case anyone looks up MacGuffin on Wikipedia, I don't think the description there is strictly accurate. A MacGuffin doesn't drive the plot so much as throw the viewer of the scent.
I think Hitchcock might disagree with you. In any case, serializing functions is as easy as you want it to be. But, there is a big caveat: You are basically embedding a compiler into your run-time. It can be pretty minimal, relying only on facts known about recursively enumerable functions: class Serialize a where serialize :: a -> ByteString unSerialize :: ByteString -> Maybe a -- Parsers can fail instance (Serialize a) => Serialize [a] where ... instance (Serialize a, Serialize b) => Serialize (a, b) where ... -- We can conclude that a and b must be enumerable from the requirement that -- f is recursively enumerable: instance (Serialize a, Enum a, Serialize b, Enum b) => Serialize (a -> b) where serialize f = serialize $ ( zip [minBound..maxBound] (fmap f [minBound..maxBound]) ) -- A map instance could be better: we trade some serialization time for more -- deserialization time. instance (Serialize a, Serialize b) => Serialize (Map a b) where ... instance (Serialize a, Serialize b) => Serialize (a -> b) where serialize f = serialize . fromList $ ( zip [minBound..maxBound] (fmap f [minBound..maxBound]) ) deserialize map = \x -> lookup x (bytestring_decode_map map) where bytestring_decode_map = ... There are potentially big problems with this approach: (i) Many data types are not instances of Enum (though the really should be. Deriving enumerations is not that hard. I am willing to take one for the team to get GHC to figure out how to derive Enum on arbitrary types, but I'm not sure where to start. Little help?) (ii) Time and memory. We're not encoding a series of instructions for computing pure functions, but we're precomputing the results and saving them all for later. This is at least O(size of the domain) * O(of the function on each element). Not big in theoretical terms, but something like that could easily cause a factor of [1, \infty) slowdown in real code. (iii) Serialized objects must be pure. In particular, you can't serialize general IO actions. I see this as a plus. It is still easy to write an algebra of serializable tokens and a non-serializable interpreter to generate IO actions from the tokens. We do this kind of thing all the time -- we just don't serialize the tokens usually. I think (ii) is the biggest problem. And it is a big one. We basically need something like template haskell for runtime systems in order to do quasi-quoting and compilation at run-time so we can avoid reifying the domain and its image under f. The only thing that can serialize an (IO a) is the Haskell runtime, and it does it by running the action (and so putting its sub-steps in a series).

On 11/11/2010 08:07 PM, C. McCann wrote:
Having a full serialization function without some restriction along those lines would be like renaming unsafePerformIO to runIO, moving it to Control.Monad.IO, and telling people "hey, just don't misuse this and everything will be okay".
There's been a lot of talk about "if serialisation existed, you could do X, which is bad". Well you know what? unsafePerformIO exists. unsafeCoerce exists. And using FFI, you can do utterly evil things. And...? Just today I was thinking about how useful it would be if you could send a block of code from one PC to another to execute it remotely. The fact that you can't do this is basically why there's no distributed Haskell yet, despite what an obviously good idea that would be. It would be really cool if we could do this. And yes, it should of course be an IO operation. Because, let's face it, any result it produces is inherantly going to be pretty random. (Much like my favourite GHC function name, reallyUnsafePtrEquity#...)

On 12 November 2010 20:44, Andrew Coppin
Just today I was thinking about how useful it would be if you could send a block of code from one PC to another to execute it remotely. The fact that you can't do this is basically why there's no distributed Haskell yet, despite what an obviously good idea that would be. It would be really cool if we could do this.
There was Distributed Haskell a while back[*], search for work by Frank Huch. [*] Maybe it wasn't "migratory" which I think is what you are wanting - you'd have to check the tech reports.

On Nov 12, 2010, at 12:44 PM, Andrew Coppin wrote:
Just today I was thinking about how useful it would be if you could send a block of code from one PC to another to execute it remotely. The fact that you can't do this is basically why there's no distributed Haskell yet, despite what an obviously good idea that would be. It would be really cool if we could do this.
What kind of code are you thinking about? If you have a daemon waiting on a foreign machine, you can send it bytecode to deserialize. The issue is one of /encodings/. My "naive" approach (enumerating the functions domain, applying the function, and serializing a list of pairs of results) doesn't work so well because it is slow and big. But it has all the properties we want, without having to go into the compiler and make enormous changes. We need to find an efficient representation of a function in that can be serialized. Haskell code is one option, but the compiler would need to pass it through to the run-time, so that the deserializer could run a compiler on it. Passing some kind of intermediate level code through would work as well. But we still face the same problem. We need to compile the intermediate code into executable Haskell. There is also the architecture independent approach I have described previously (with a "Serialize" type class, enumerating the functions domain and applying the function to each element, etc). This is architecture independent in virtue of being implemented as plain old run-time Haskell. It turns Haskell expressions into ByteStrings and back. In short, the deserializer is an embedded compiler for "Haskell bytecode". It interprets bytecode in terms architecture specific code. To be honest, I'm not even sure how architecture independent Data.Binary and the like really are. I'm guessing some care in the serialization library could fix any issue, though.
And yes, it should of course be an IO operation. Because, let's face it, any result it produces is inherantly going to be pretty random. (Much like my favourite GHC function name, reallyUnsafePtrEquity#...)
Serialization and transport are orthogonal. Serialization is pure. Transport is not.

Andrew Coppin wrote:
[...] if you could send a block of code from one PC to another to execute it remotely.
Eden can do this. http://www.mathematik.uni-marburg.de/~eden/ Eden is a distributed Haskell: You run multiple copies of the same binary on different machines, and you have the (#) operator to apply functions *on a remote machine*. So, for example, process (map f) # xs will serialize (map f) and send it over to some other machine. At that other machine, (map f) is deserialized and evaluated. In addition, two channels between the machines are opened: One for streaming the xs to that remote machine where the map is executed, and one for streaming the results back. The process and (#) operators have the following, rather harmless looking types: process :: (a -> b) -> Process a b (#) :: Process a b -> (a -> b) So no IO around, even if there is serialization and network communication going on. If you feel uncomfortable about that, you can use instantiate instead: instantiate :: Process a b -> a -> IO b And indeed, (#) is implemented in terms of instantiate and that unspeakable source of all false transparency: p # x = unsafePerformIO (instantiate p x) Tillmann
participants (6)
-
Alexander Solla
-
Andrew Coppin
-
C. McCann
-
Gregory Crosswhite
-
Stephen Tetley
-
Tillmann Rendel