Serialization of (a -> b) and IO a

Jesse Schalken

11 Nov 2010 11 Nov '10

7:53 a.m.

Is it possible to serialize and deserialize a function to/from binary form, perhaps using Data.Binary, for example? What about an IO action? If so, is there a way the serialized representation could be architecture-independent? I have been shown how useful it can be to store functions inside data structures, and while looking at data serialization for the purpose of persistence I wondered "since functions are just values in Haskell, why can't we persist them, too?". To me the idea has interesting implications. For example, an arbitrary program could simply be represented by a serialization of `IO ()`. In fact, you could load any program into memory from a file and use Control.Concurrent.forkIO to run it, and later kill it, giving you the beginnings of an operating environment. If such a serialization is architecture independent then to my understanding you have the beginnings of a virtual machine. You could break your program into pieces and store them in a database and load them when needed, or even pull updates to each piece individually from down the web etc, enabling interesting methods of software distribution. It would make very cool stuff possible. I have had a look at hs-plugins, but it is unclear how to derive a simple pair of functions `(a -> b) -> ByteString` and `ByteString -> Either ParseError (a -> b)`, for example, from the functionality it provides, if it is possible at all. I guess such a thing requires thorough digging into the depths of GHC, (or maybe even LLVM if an architecture independent representation is sought, but I don't know enough to say.). Perhaps this is more a question for those interested and knowledgable in Haskell compilation (and, to some extent, decompilation). If not Haskell, are there any languages which provide a simple serialization and deserialization of functions?

Attachments:

attachment.html (text/html — 2.0 KB)

Show replies by date

Luke Palmer

11 Nov 11 Nov

8:36 a.m.

On Thu, Nov 11, 2010 at 12:53 AM, Jesse Schalken wrote:

...

I have had a look at hs-plugins, but it is unclear how to derive a simple pair of functions `(a -> b) -> ByteString` and `ByteString -> Either ParseError (a -> b)`, for example, from the functionality it provides, if it is possible at all. I guess such a thing requires thorough digging into the depths of GHC, (or maybe even LLVM if an architecture independent representation is sought, but I don't know enough to say.). Perhaps this is more a question for those interested and knowledgable in Haskell compilation (and, to some extent, decompilation). If not Haskell, are there any languages which provide a simple serialization and deserialization of functions?

As far as I know, GHC has no support for this. There are issues with the idea that will come out pretty fast, such as: (1) Those cannot be pure functions, because it differentiate denotationally equal functions. So it would have to be at least (a -> b) -> IO ByteString. (2) What if you tried to serialize a filehandle or an FFI Ptr? But my answers are "Ok" and "Then you get a runtime error", respectively. It is by no means impossible, IIRC Clean does it. I think it's pretty dumb that we don't have support for this yet. Purely functional languages have a unique disposition to be very good at this. But oh well, there aren't enough tuits for everything. A more elaborate answer to (2) is "you get a runtime error when you try to *use* the thing that was impossible to serialize". This makes sure that you don't get an error if you are trying to serialize \x -> const x a_filehandle_or_something, which is just the identity function. Luke

Thomas Davie

9:25 a.m.

On 11 Nov 2010, at 08:36, Luke Palmer wrote:

...

On Thu, Nov 11, 2010 at 12:53 AM, Jesse Schalken wrote:

...
I have had a look at hs-plugins, but it is unclear how to derive a simple pair of functions `(a -> b) -> ByteString` and `ByteString -> Either ParseError (a -> b)`, for example, from the functionality it provides, if it is possible at all. I guess such a thing requires thorough digging into the depths of GHC, (or maybe even LLVM if an architecture independent representation is sought, but I don't know enough to say.). Perhaps this is more a question for those interested and knowledgable in Haskell compilation (and, to some extent, decompilation). If not Haskell, are there any languages which provide a simple serialization and deserialization of functions?

As far as I know, GHC has no support for this. There are issues with the idea that will come out pretty fast, such as:

(1) Those cannot be pure functions, because it differentiate denotationally equal functions. So it would have to be at least (a -> b) -> IO ByteString.

I don't think I agree, I didn't see a rule f == g => serialise f == serialise g anywhere. Bob

Max Rabkin

9:52 a.m.

On Thu, Nov 11, 2010 at 11:25, Bob wrote:

...

I don't think I agree, I didn't see a rule f == g => serialise f == serialise g anywhere.

The rule a == b => f a == f b is called referential transparency (for denotational equality, not Eq) and is (perhaps the most important) part of what is meant by "purely functional". --Max

Dan Doel

10:34 a.m.

On Thursday 11 November 2010 4:25:46 am Thomas Davie wrote:

...

I don't think I agree, I didn't see a rule f == g => serialise f == serialise g anywhere.

That equal things can be substituted for one another is one of the fundamental ideas of equality (the other is that anything is equal to itself). In fact, in second order logic, equality can be *defined* as (roughly): (x = y) means (forall P. P x -> P y) That is, x is equal to y if all predicates satisfied by x are also satisfied by y. Using this, one can derive other typical laws for equality. Transitivity is pretty easy, but surprisingly, even symmetry can be gotten: If P z is z = x, using substitution we get x = x -> y = x, and x = x is trivially true. And of course, we also get congruence: Given a function f, let P z be f x = f z, substitution yields f x = f x -> f x = f y, where f x = f x is again trivial. The equality that people typically expect to hold for Haskell expressions is that two such expressions are equal if they denote the same thing, as Max said. Expressions with function type denote mathematical functions, and so if we have something like: serialize :: (Integer -> Integer) -> String it must be a mathematical function. Further, its arguments will denote functions, to, and equality on mathematical functions can be given point-wise: f = g iff forall x. f x = g x Now, here are two expressions with type (Integer -> Integer) that denote equal functions: \x -> x + x \x -> 2 * x So, for all this to work out, serialize must produce the same String for both of those. But in general, it isn't possible to decide if two functions are point-wise equal, let alone select a canonical representative for each equivalence class of expressions that denote a particular function. So there's no feasible way to implement serialize without breaking Haskell's semantics. How making serialize :: (Integer -> Integer) -> IO String solves this? Well, that's another story. -- Dan

Sjoerd Visscher

11:22 a.m.

...

The equality that people typically expect to hold for Haskell expressions is that two such expressions are equal if they denote the same thing, as Max said. Expressions with function type denote mathematical functions, and so if we have something like:

serialize :: (Integer -> Integer) -> String

it must be a mathematical function. Further, its arguments will denote functions, to, and equality on mathematical functions can be given point-wise:

f = g iff forall x. f x = g x

Now, here are two expressions with type (Integer -> Integer) that denote equal functions:

\x -> x + x \x -> 2 * x

So, for all this to work out, serialize must produce the same String for both of those.

What I'm wondering is if it would actually break things if serialize would not produce the same String for these functions. The reasoning above is used regularly to shoot down some really useful functionality. So what would go wrong if we chose to take the practical path, and leave aside the theoretical issues? Also, the above two functions might not be exactly denotationally equal if the type is (Float -> Float), or the speed or memory use might be different. Could it not be that requiring them to be equal could just as well break things? greetings, Sjoerd Visscher

Gábor Lehel

12:23 p.m.

On Thu, Nov 11, 2010 at 12:22 PM, Sjoerd Visscher wrote:

...

...
The equality that people typically expect to hold for Haskell expressions is that two such expressions are equal if they denote the same thing, as Max said. Expressions with function type denote mathematical functions, and so if we have something like:

serialize :: (Integer -> Integer) -> String

it must be a mathematical function. Further, its arguments will denote functions, to, and equality on mathematical functions can be given point-wise:

f = g iff forall x. f x = g x

Now, here are two expressions with type (Integer -> Integer) that denote equal functions:

\x -> x + x \x -> 2 * x

So, for all this to work out, serialize must produce the same String for both of those.

What I'm wondering is if it would actually break things if serialize would not produce the same String for these functions. The reasoning above is used regularly to shoot down some really useful functionality. So what would go wrong if we chose to take the practical path, and leave aside the theoretical issues?

Yeah, my sense -- but correct my if I'm reading the original post incorrectly -- is that the whole thing with function equality is a distraction and not really relevant here. It is true that (a) per Luke Palmer, if we could serialize equal functions to equal representations then we could we could decide whether two pure functions were equal, which (if not done in the IO monad) would(?) break purity; and (b) per Dan Doel, if we wanted to implement our serialization in a way so that equal functions get equal representations, we couldn't do it, because it's an impossible problem. But these sort of cancel each other out, because (a) it's an impossible problem, and (b) we don't want to do it. A function which does "x+x" would simply be serialized as doing x+x, and a function which does "x*2" would be serialized as doing x*2, and when deserialized the resulting functions would continue to do those things, and it would be completely agnostic and indifferent as to whether or not they are in fact equal. Obviously there are questions here with regards to the functions which the to-be-serialized function makes use of -- should they be serialized along with it? Required to be present when it is deserialized? Is it OK for the function to do something different when it is loaded compared to when it was stored if its environment is different, or not OK?

...

Also, the above two functions might not be exactly denotationally equal if the type is (Float -> Float), or the speed or memory use might be different. Could it not be that requiring them to be equal could just as well break things?

greetings, Sjoerd Visscher

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Work is punishment for failing to procrastinate effectively.

Jesse Schalken

1:15 p.m.

2010/11/11 Gábor Lehel

...

Obviously there are questions here with regards to the functions which the to-be-serialized function makes use of -- should they be serialized along with it? Required to be present when it is deserialized? Is it OK for the function to do something different when it is loaded compared to when it was stored if its environment is different, or not OK?

I would have say Yes, No, No. At the moment, when you serialise data structure A which references data structure B which references data structure C, using Data.Binary for example, the whole lot (A, B, and C) gets serialised, so that the resulting deserialization of A is denotationally equivalent to the original, regardless of the environment. I don't see why this shouldn't be the case for functions also. So a serialized function should include all its direct and indirect callees. This might result in potentially simple functions ending up enormous when serialized, simply because the call graph, including all it's libraries and their libraries etc, is that size, but such would be pure function serialization. This raises the question of what is left. The assembled machine code? For the architecture of the serializer or of the deserializer? Or LLVM IR for architecture independence? C--? Core? I don't know, but it would be awesome for the serialized representation to be both low-level and architecture independent, then having it JIT compiled when it is deserialized. To me, this means a virtual machine, which I guess is what you need when you want fast mobile code, but I'm just musing here as I know little about programming language implementation.

Gábor Lehel

1:20 p.m.

On Thu, Nov 11, 2010 at 2:15 PM, Jesse Schalken wrote:

...

2010/11/11 Gábor Lehel

...
Obviously there are questions here with regards to the functions which the to-be-serialized function makes use of -- should they be serialized along with it? Required to be present when it is deserialized? Is it OK for the function to do something different when it is loaded compared to when it was stored if its environment is different, or not OK?

I would have say Yes, No, No. At the moment, when you serialise data structure A which references data structure B which references data structure C, using Data.Binary for example, the whole lot (A, B, and C) gets serialised, so that the resulting deserialization of A is denotationally equivalent to the original, regardless of the environment. I don't see why this shouldn't be the case for functions also. So a serialized function should include all its direct and indirect callees. This might result in potentially simple functions ending up enormous when serialized, simply because the call graph, including all it's libraries and their libraries etc, is that size, but such would be pure function serialization. This raises the question of what is left. The assembled machine code? For the architecture of the serializer or of the deserializer? Or LLVM IR for architecture independence? C--? Core? I don't know, but it would be awesome for the serialized representation to be both low-level and architecture independent, then having it JIT compiled when it is deserialized. To me, this means a virtual machine, which I guess is what you need when you want fast mobile code, but I'm just musing here as I know little about programming language implementation.

I'm not an expert here either, I'll just note that LLVM is only platform independent to a degree. Or rather, I believe the situation is that it *is* architecture independent, but it doesn't abstract anything else besides the architecture -- so if you have any other differences, it's not going to work. LLVM IR doesn't do #ifdefs.

...

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Work is punishment for failing to procrastinate effectively.

Malcolm Wallace

1:29 p.m.

...

I'll just note that LLVM is only platform independent to a degree. Or rather, I believe the situation is that it *is* architecture independent, but it doesn't abstract anything else besides the architecture

In particular, imagine how you might serialise a Haskell function which is an FFI binding to some external platform-specific library (e.g. Posix, Win32, Gtk+, WPF), such that you could save it on a Windows machine, copy to Linux or Mac, and start it running again. Regards, Malcolm

Gábor Lehel

1:42 p.m.

On Thu, Nov 11, 2010 at 2:29 PM, Malcolm Wallace wrote:

...

...
I'll just note that LLVM is only platform independent to a degree. Or rather, I believe the situation is that it *is* architecture independent, but it doesn't abstract anything else besides the architecture

In particular, imagine how you might serialise a Haskell function which is an FFI binding to some external platform-specific library (e.g. Posix, Win32, Gtk+, WPF), such that you could save it on a Windows machine, copy to Linux or Mac, and start it running again.

...and FFI imports are something you definitely can't serialize, so that's one case where you either have it say "call the linked-in function with the name 'whatever'" and hope it's the same one, or just disallow it entirely.

...

Regards, Malcolm _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Work is punishment for failing to procrastinate effectively.

Dan Doel

2:36 p.m.

On Thursday 11 November 2010 6:22:06 am Sjoerd Visscher wrote:

...

What I'm wondering is if it would actually break things if serialize would not produce the same String for these functions.

Yes. It would break the (usual) mathematical underpinnings of Haskell.

...

The reasoning above is used regularly to shoot down some really useful functionality. So what would go wrong if we chose to take the practical path, and leave aside the theoretical issues?

You would lose many uses of equational reasoning in your programs. Have you every substituted 'x * 2' for the expression 'x + x' in one of your programs, or vice versa? You can no longer do that, because someone may be serializing the function you're writing, checking how it's implemented, and relying it. You'd lose the whole notion of 'the category of haskell types and functions' goodbye, too. Does f . id = f? Not if the former serializes as "f . id". We already have a weak case of this, since (\x -> undefined x) can be distinguished from undefined using seq, but that can be hand-waved away by not worrying about bottoms so much. That isn't going to work for serialize. There does exist practical Haskell stuff from category theory inspired people. As a digression... When you get into things like dependent type theory, equality is actually incorporated into the language in a much more direct way. And there are various sorts of equality you can add to your type theory. Two common choices are: intensional equality: two values are provably equal if they evaluate to the same normal form extensional equality: this incorporates non-computational rules, like the point-wise equality of functions. Now, in a type theory where equality is intensional, I cannot prove: (forall x. f x = g x) -> f = g However, both these equalities (and others in between and on either side) are *compatible*, in that I cannot disprove the above theorem in an intensional theory. What seq and serialize do is break from extensional equality, and allow us to disprove the above (perhaps not for seq within a hypothetical theory, since the invalidating case involves non-termination, but certainly for serialize). And that's a big deal, because extensional equality is handy for the above reasoning about programs.

...

Also, the above two functions might not be exactly denotationally equal if the type is (Float -> Float), or the speed or memory use might be different. Could it not be that requiring them to be equal could just as well break things?

I would not be at all surprised if they are unequal for Float -> Float, considering how ugly floating point types are. That doesn't justify breaking equality for every other type. Speed and memory use are typically disregarded for equality of (expressions denoting) functions. Merge sort and bubble sort are two algorithms for computing the same function, even though they vary wildly in these two characteristics. If you want a calculus for precisely reasoning about resource usage, then you would not necessarily be able to consider these equal. But, Haskell is simply not such a calculus, and I don't think, "what if it were," is a good argument for actively breaking extensional equality in such a dramatic way (since the breakage would have nothing to do with precise reasoning about resource usage). Gábor Lehel wrote:

...

It is true that (a) per Luke Palmer, if we could serialize equal functions to equal representations then we could we could decide whether two pure functions were equal, which (if not done in the IO monad) would(?) break purity; and (b) per Dan Doel, if we wanted to implement our serialization in a way so that equal functions get equal representations, we couldn't do it, because it's an impossible problem.

But these sort of cancel each other out, because (a) it's an impossible problem, and (b) we don't want to do it.

They do not cancel each other out. Rather, it goes like this: a) Serializing functions gives you a decision procedure for function equality. b) It is impossible to decide extensional equality of functions. Together, these mean that any serialization procedure you define is *wrong*, in that it violates the semantics of pure Haskell expressions (the resulting decision procedure will answer 'False' for some pair of extensionally equal expressions f and g). The way IO can get around this is that when you have: serialize (\x -> x + x :: Integer) :: IO String serialize (\x -> 2 * x :: Integer) :: IO String you can say that the two IO actions are equal in the following sense: 1) There is an equivalence class of expressions denoting the function in question. This equivalence class contains (\x -> x + x) and (\x -> 2 * x) 2) The IO action produced by serialize selects a string representation for one of these expressions at random*. *) You probably won't ever observe "\x -> x + x" being the output of (serialize (\x -> 2 * x)), but that's just a coincidence. This explanation is not available if serialize has type (Integer -> Integer) -> String, however. Those have to be mathematical functions (of a sort) that select a unique representation for each equivalence class, or else we have to get a completely different (and much less nice) denotational semantics for Haskell. ---- Perhaps I missed it, but: has anyone explained why: (a -> b) -> IO String is an unacceptable type? Why does it need to be just String? Folks often ask, "what benefit is purity?" Well, what is the benefit of making this function impure? GHC has a lot of normally impure functionality incorporated in a way that is mathematically sound (though not ideal). What are the things that have been shot down? Or does IO, ST, etc. qualify as getting shot down? -- Dan

Ketil Malde

3:53 p.m.

Dan Doel writes:

...

You'd lose the whole notion of 'the category of haskell types and functions' goodbye, too. Does f . id = f? Not if the former serializes as "f . id".

..and you are able to tell the difference. Am I wrong in thinking that this could be made to work if serialization was to/from an opaque type instead of (Byte)String, so that the *only* operations would be serialization and deserialization (and possibly storing to/from file)? -k -- If I haven't seen further, it is by standing in the footprints of giants

C. McCann

4:34 p.m.

On Thu, Nov 11, 2010 at 10:53 AM, Ketil Malde wrote:

...

..and you are able to tell the difference. Am I wrong in thinking that this could be made to work if serialization was to/from an opaque type instead of (Byte)String, so that the *only* operations would be serialization and deserialization (and possibly storing to/from file)?

This was my first thought as well! However, reading to/from a file would of course be in IO, at which point you'd be free to read the file back in through normal means to get at the representation. So in that respect, this is equivalent to (a -> b) -> IO String. Outside of IO, it would pretty much have to be limited to serializing and deserializing. You'd be able to create opaque tokens representing functions, pass them around, and/or extract the function in order to apply it. Conveniently, it turns out that Haskell already has support for this, you can implement it as follows:

...

module Serialize.Pure (OpaqueFunction, serialize, deserialize) where

newtype OpaqueFunction a b = Opaque { deserialize :: a -> b }

serialize = Opaque

Toss in some existential types as desired, if you want to hide the function's actual type. I suppose one could object that this isn't actually serializing anything at all; to which I would respond that, in pure code, how do you expect to tell the difference? - C.

Ketil Malde

12 Nov 12 Nov

9:24 a.m.

"C. McCann" writes:

...

This was my first thought as well! However, reading to/from a file would of course be in IO, at which point you'd be free to read the file back in through normal means to get at the representation. So in that respect, this is equivalent to (a -> b) -> IO String.

IMO, it's morally different, you're now operating on a file, and you shouldn't rely on the contents being predictable. You can make the sin-bin argument that IO can do anything, but I think there's a moral distinction between serialize :: a -> IO ByteString x <- serialize f and serialize :: a -> Opaque store :: Opaque -> FilePath -> IO () do x <- serialize f store x n B.readFile n You could probably already snarf chunks of the heap and dump them to file.

...

I suppose one could object that this isn't actually serializing anything at all; to which I would respond that, in pure code, how do you expect to tell the difference?

Nice one :-) I guess the real question is what are the useful, pure operations on an opaque type that can contain arbitrary functions. -k -- If I haven't seen further, it is by standing in the footprints of giants

C. McCann

6:11 p.m.

On Fri, Nov 12, 2010 at 4:24 AM, Ketil Malde wrote:

...

IMO, it's morally different, you're now operating on a file, and you shouldn't rely on the contents being predictable. You can make the sin-bin argument that IO can do anything, but I think there's a moral distinction between

serialize :: a -> IO ByteString x <- serialize f

and

serialize :: a -> Opaque store :: Opaque -> FilePath -> IO ()

Any distinction here is mostly at the level of API design and informal semantics; I'm inclined to agree with your preference, but as far as impacts on the formal semantics of pure code go, these are essentially equivalent.

...

You could probably already snarf chunks of the heap and dump them to file.

Yep, and this is pretty much the reason, taken to its logical conclusion, why almost all bets are off about what IO computations can potentially do.

...

...
I suppose one could object that this isn't actually serializing anything at all; to which I would respond that, in pure code, how do you expect to tell the difference?

Nice one :-)

I guess the real question is what are the useful, pure operations on an opaque type that can contain arbitrary functions.

I would be very surprised if there were any that couldn't just as well be done on the function directly. Even extracting type information could be problematic if done incorrectly! Consider a function (inspectType :: Opaque -> Serialization.Type), where the Show instance for Type produces an approximation of the type signature the function was declared with. Reasonable? Nope, we just broke everything. Imagine a function taking an argument of type ((a, a) -> a). If by serializing the argument it can recover the declared type signature it could distinguish between fst and ((\(x, _) -> x) :: forall a. (a, a) -> a), which again opens the door to non-parametric behavior. On the other hand, (inspectType :: Opaque -> Data.Typeable.TypeRep) would probably be safe, because it supports only monomorphic types. - C.

Sjoerd Visscher

11 Nov 11 Nov

4:54 p.m.

On Nov 11, 2010, at 3:36 PM, Dan Doel wrote:

...

On Thursday 11 November 2010 6:22:06 am Sjoerd Visscher wrote:

...
The reasoning above is used regularly to shoot down some really useful functionality. So what would go wrong if we chose to take the practical path, and leave aside the theoretical issues?

You would lose many uses of equational reasoning in your programs. Have you every substituted 'x * 2' for the expression 'x + x' in one of your programs, or vice versa? You can no longer do that, because someone may be serializing the function you're writing, checking how it's implemented, and relying it.

Yes, but it would not break any existing code. It would only break code that knowingly did the wrong thing.

...

We already have a weak case of this, since (\x -> undefined x) can be distinguished from undefined using seq, but that can be hand-waved away by not worrying about bottoms so much. That isn't going to work for serialize.

Why not? -- Sjoerd Visscher http://w3future.com

Dan Doel

12 Nov 12 Nov

1:16 a.m.

On Thursday 11 November 2010 11:54:56 am Sjoerd Visscher wrote:

...

Yes, but it would not break any existing code. It would only break code that knowingly did the wrong thing.

Well, if we added a function that randomly scrambled GHC's heap memory, it wouldn't break any existing code, because none would use it. :)

...

...
We already have a weak case of this, since (\x -> undefined x) can be distinguished from undefined using seq, but that can be hand-waved away by not worrying about bottoms so much. That isn't going to work for serialize.

Why not?

Because, there is an argument to be made in the seq case that no one really cares about the differences it introduces. I don't usually care how my code works on bottoms, except inasmuch as it determines various performance characteristics of the code*, or whether it works on infinite values. I try, generally speaking, to write total functions, and run them on well-defined inputs. So when I reason about the programs, it is this aspect that I care most about, not about what happens when I feed in undefineds. There are even folks that have worked on making this perspective rigorous. See, Fast and Loose Reasoning is Morally Correct. serialize is not at all the same in this regard. There is no class of functions that is immune to its inspection powers, presumably, because that's its whole point. But that also means that there is no class of functions for which we are justified in reasoning equationally using the standard extensional equality. The only way that would be justified is saying, "serialize doesn't exist."

...

Then don't do that. Being able to serialize functions is just as dangerous as having unsafePerformIO. If you don't use it, you don't have problems.

And unsafePerformIO's very name suggests that you're breaking things when you use it. It comes with lots of caveats, and the Haskell community will generally heap scorn on you if you use it for trivialities (or even non- trivialities). I don't understand why it would be desirable for a serialize function to be branded, "don't ever use this unless you're an expert who knows what he's doing." unsafePerformIO is, for many uses, a concession to low-level compiler extensibility. You can implement a fair amount of stuff in a library using unsafePerformIO that would otherwise require some kind of compiler support. What is the analogous motivation for functions to be turned into pure strings containing their code? Of course, if you want, "everything is unsafe, be careful," there are many, many languages out there that already do that. For instance, ML**, Lisp, C, Java, Python, .... But this is Haskell, and we try to do a little better than that. -- Dan [*] Which is irrelevant to the reasoning in question. In fact, if you follow Bird's methodology, you first write a naive, obviously correct program, and then you transform it into a more efficient version via transformations you've shown to preserve semantics. This approach doesn't work if there are no correctness-preserving transformations, though. [**] In fact, Alice ML (I think that's the right one) already has very fancy serialization stuff. You can store and load and send entire modules over an internet connection, I believe.

Luke Palmer

2:23 a.m.

On Thu, Nov 11, 2010 at 6:16 PM, Dan Doel wrote:

...

serialize is not at all the same in this regard. There is no class of functions that is immune to its inspection powers, presumably, because that's its whole point. But that also means that there is no class of functions for which we are justified in reasoning equationally using the standard extensional equality. The only way that would be justified is saying, "serialize doesn't exist."

Admittedly, the class of reasoning I usually use in my Haskell programs, and the one that you talked about using earlier this message, is essentially "seq doesn't exist". However, I prefer to use this class of reasoning because I would prefer if seq actually didn't exist (er, I think the implication goes the other way). Not so for serialize: I would like a serialize function, but I don't want the semantic burden it brings. If only there were a way to... oh yeah. serialize :: (a -> b) -> IO String I still don't really get what we're arguing about. Luke

Dan Doel

9:04 a.m.

On Thursday 11 November 2010 9:23:13 pm Luke Palmer wrote:

...

Admittedly, the class of reasoning I usually use in my Haskell programs, and the one that you talked about using earlier this message, is essentially "seq doesn't exist". However, I prefer to use this class of reasoning because I would prefer if seq actually didn't exist (er, I think the implication goes the other way).

seq can still exist, I think. And I still want it (well, I could leave it for functions, really, I think). What doesn't exist, loosely speaking, is bottom, which means: forall x y. x `seq` y = y And so seq = flip const. That makes things like: foo ... = ... (x `seq` y) ... appear useless, unless we remember that denotational semantics aren't the end- all and be-all, in which case we can recognize that seq is used as an operational hint to the compiler, same as par and pseq. It just happens to be the case that in Haskell's ordinary semantics, merely giving the denotational semantics of seq is sufficient to induce the right operational behavior, provided the compiler isn't bone headed (and further, is lenient enough to allow sufficiently smart compilers to disregard our naive 'evaluate x before y' reading of seq if it's more efficient to do so).

...

Not so for serialize: I would like a serialize function, but I don't want the semantic burden it brings. If only there were a way to...

oh yeah.

serialize :: (a -> b) -> IO String

I still don't really get what we're arguing about.

I don't know. -- Dan

roconnor＠theorem.ca

6:40 p.m.

On Thu, 11 Nov 2010, Dan Doel wrote:

...

intensional equality: two values are provably equal if they evaluate to the same normal form

extensional equality: this incorporates non-computational rules, like the point-wise equality of functions.

Now, in a type theory where equality is intensional, I cannot prove:

(forall x. f x = g x) -> f = g

However, both these equalities (and others in between and on either side) are *compatible*, in that I cannot disprove the above theorem in an intensional theory.

What seq and serialize do is break from extensional equality, and allow us to disprove the above (perhaps not for seq within a hypothetical theory, since the invalidating case involves non-termination, but certainly for serialize). And that's a big deal, because extensional equality is handy for the above reasoning about programs.

As you are well aware in Coq, and in Agda we don't have an extensionality axiom; however this is not a problem because we simply use setoid equality to capture extenional reasoning and prove that in every specific case where we want to use extensioanl reasoning it is sound to do so. Now suppose I add the following consitent axiom to Coq: Axiom Church-Turing : forall f:Nat -> Nat, exists e:Nat, forall n:Nat, {e}(n) = f(n) This well-studied axiom is effectively what serialize is realizing[1]. Now, have I broken my old Coq proofs by adding this axiom? No, of course not, because it is a consistent axioms and my proofs didn't use it. My proofs were alreay explicity proving that extentional substitution was sound in those cases I was using it. The same will be true for reasoning in Haskell. Before serialization we knew that extensional substitution was sound, but after adding serialization we are now obligated to prove in the individual cases that extensional subsitution is sound and/or add extentionally preconditions to our proofs. So no big deal; people have already been doing this in Coq and Agda for years. [1]Actaully the realizer for serialize is *weaker* that this axioms. The realizer for serialize would be (Nat -> Nat) -> IO Nat instead of (Nat -> Nat) -> Nat, so should have less impact that the Church-Turing axiom. -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

Alexander Solla

6:53 p.m.

On Nov 12, 2010, at 10:40 AM, roconnor@theorem.ca wrote:

...

[1]Actaully the realizer for serialize is *weaker* that this axioms. The realizer for serialize would be (Nat -> Nat) -> IO Nat instead of (Nat -> Nat) -> Nat, so should have less impact that the Church-Turing axiom.

I don't see where IO comes in if you're dealing with pure functions. Serializing pure structures is really really easy and can be done entirely purely. The tricky part is finding a good encoding. As I said in my original message on this topic, you are embedding a compiler and interpreter into your runtime. That can be as easy as what is below, or as tricky as embedding GHC in your runtime, or anywhere in between. But you simply cannot serialize IO actions without the GHC runtime. The GHC runtime sequences IO actions in ways that depend on the unserializable environment. (Imposed) Sequencing + Concrete representation = Serialization. class Serialize a where serialize :: a -> ByteString unSerialize :: ByteString -> Maybe a -- Parsers can fail instance (Serialize a) => Serialize [a] where ... instance (Serialize a, Serialize b) => Serialize (a, b) where ... -- We can conclude that a and b must be enumerable from the requirement that -- f is recursively enumerable: instance (Serialize a, Enum a, Serialize b, Enum b) => Serialize (a -> b) where serialize f = serialize $ ( zip [minBound..maxBound] (fmap f [minBound..maxBound]) ) -- A map instance might be better: we trade some serialization time for more -- deserialization time. instance (Serialize a, Serialize b) => Serialize (Map a b) where ... instance (Enum a, Enum b, Serialize a, Serialize b) => Serialize (a -> b) where serialize f = serialize . fromList $ ( zip [minBound..maxBound] (fmap f [minBound..maxBound]) ) deserialize map = \x -> lookup x (bytestring_decode_map map) where bytestring_decode_map = ...

Dan Doel

13 Nov 13 Nov

2:05 a.m.

It took me a bit to decide whether this was an adequate counter to my objection, but ultimately, I don't think it is. I'll try to explain as well as possible. On Friday 12 November 2010 1:40:10 pm roconnor@theorem.ca wrote:

...

As you are well aware in Coq, and in Agda we don't have an extensionality axiom; however this is not a problem because we simply use setoid equality to capture extenional reasoning and prove that in every specific case where we want to use extensional reasoning it is sound to do so.

If we are talking about writing programs in a closed world, and proving things about said world, this is fine. But this is not always what we are doing in Haskell. For instance, if I am writing a library, am I justified in changing: f (g x) (g x) to: let y = g x in f y y without bumping the "semantics breaking change" version number? In a closed world, where I have proved that all the relevant code I have written admits extensionality, the answer is, "yes." But in an open world, where people are free to use serialize to inspect my implementations in pure code, the answer is, "no."

...

Now suppose I add the following consistent axiom to Coq:

Axiom Church-Turing : forall f:Nat -> Nat, exists e:Nat, forall n:Nat, {e}(n) = f(n)

So, your argument is, if I'm not mistaken, that intensional type theory can be consistently extended with this axiom. I believe this. In fact, I'd even be willing to accept that purely as an axiom, it might be consistent to add it to extensional type theory. However, this is not the whole story. One question I have to ask is: how does this compute? In Agda, and I suspect Coq, axioms simply do not compute (and this is the reason I'd be willing to believe Church-Turing is consistent with an extensional theory). However, serialize *will* compute, and I expect it to compute like this: forall e:Nat. serialize {e} => e And I can believe that even this is consistent with intensional type theory. In an intensional theory, I can imagine (Nat -> Nat) being interpreted not as a type of functions, but as a type of algorithms. There may be many algorithms that compute the same function, and Nat -> Nat contains them all. serialize/Church-Turing returns the source code of each one. When we assert extensionality, though, Nat -> Nat can no longer be inhabited by mere algorithms, though. In an extensional theory, Nat -> Nat is a quotient of the set of algorithms. And then we have only a few options: 1) serialize/Church-Turing violates the quotient 2) serialize/Church-Turing computes extensional equality of algorithms, and chooses a single 'source code' representation for each equivalence class 3) It is impossible for two different pieces of source code to compute the same function (so serialize {e} => e is valid because there is no e' /= e such that forall x. {e} x = {e'} x) *) ... 2 is, I think, impossible, and 3 is simply false, so the choice is 1, meaning that ITT + a Church-Turing that actually computes cannot be consistently extended with extensionality. And it is this that I care about, and what I was referring to in the mail you replied to: - Intensional type theory can be consistently extended to extensional type theory. - serialize is anti-extensional (similar to the way impredicative Set and injective type constructors are anti-classical). And it is this consistent extension that I care about. And, going back to my library example, the reason to care about this is abstraction. I want clients of my library to program to the abstraction of my code as functions, not as algorithms/source code, because that allows me leeway to tweak the algorithms as I see fit, so long as they compute the same function. And, for that matter, the compiler can do this kind of mucking around with algorithms. I think it's a big deal if, to enable optimization of a function, we have to prove that all code in our program treats it extensionally (I don't believe the compiler can do it, currently, barring serialize simply not being used; and say goodbye to separate compilation).

...

[1]Actaully the realizer for serialize is *weaker* that this axioms. The realizer for serialize would be (Nat -> Nat) -> IO Nat instead of (Nat -> Nat) -> Nat, so should have less impact that the Church-Turing axiom.

I have no problem with serialize :: (Nat -> Nat) -> IO Nat. The problem is with (Nat -> Nat) -> Nat. The former can respect the quotient in question via IO hand waving. Perhaps this distinction is frivolous, but it seems wrong to make the language in general anti-extensional when it could instead be put inside the sin bin. My quarrel is more with, "let's dump out the sin bin and just be careful." -- Dan [*] I have seen a paper about a type theory with quotients that has a function: choose : T / R -> T such that: choose (squish x) => x but it was meticulously designed to allow that in ways that I don't really remember, so I'm going to pretend it doesn't exist at the moment.

Alberto G. Corona

11 Nov 11 Nov

8:57 a.m.

There are some straighforward tricks using the package eval (or hint)l. This is more or less the idea in pseudocode: type FuncExpr= String data F a = F FuncExpr a apply (F _ f) x= f x instance Show (F a) where show (F str _)= str instance Read (F a) where read (F str f)= eval f >>= F str 2010/11/11 Jesse Schalken

...

Is it possible to serialize and deserialize a function to/from binary form, perhaps using Data.Binary, for example? What about an IO action? If so, is there a way the serialized representation could be architecture-independent?

I have been shown how useful it can be to store functions inside data structures, and while looking at data serialization for the purpose of persistence I wondered "since functions are just values in Haskell, why can't we persist them, too?".

To me the idea has interesting implications. For example, an arbitrary program could simply be represented by a serialization of `IO ()`. In fact, you could load any program into memory from a file and use Control.Concurrent.forkIO to run it, and later kill it, giving you the beginnings of an operating environment. If such a serialization is architecture independent then to my understanding you have the beginnings of a virtual machine. You could break your program into pieces and store them in a database and load them when needed, or even pull updates to each piece individually from down the web etc, enabling interesting methods of software distribution. It would make very cool stuff possible.

I have had a look at hs-plugins, but it is unclear how to derive a simple pair of functions `(a -> b) -> ByteString` and `ByteString -> Either ParseError (a -> b)`, for example, from the functionality it provides, if it is possible at all. I guess such a thing requires thorough digging into the depths of GHC, (or maybe even LLVM if an architecture independent representation is sought, but I don't know enough to say.). Perhaps this is more a question for those interested and knowledgable in Haskell compilation (and, to some extent, decompilation).

If not Haskell, are there any languages which provide a simple serialization and deserialization of functions?

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Alberto G. Corona

9 a.m.

nstance Read (F a) where read str= eval str >>= F str sorry 2010/11/11 Alberto G. Corona

...

There are some straighforward tricks using the package eval (or hint)l.

This is more or less the idea in pseudocode:

type FuncExpr= String

data F a = F FuncExpr a

apply (F _ f) x= f x

instance Show (F a) where show (F str _)= str

instance Read (F a) where read (F str f)= eval f >>= F str

2010/11/11 Jesse Schalken

...
Is it possible to serialize and deserialize a function to/from binary form, perhaps using Data.Binary, for example? What about an IO action? If so, is there a way the serialized representation could be architecture-independent?

I have been shown how useful it can be to store functions inside data structures, and while looking at data serialization for the purpose of persistence I wondered "since functions are just values in Haskell, why can't we persist them, too?".

To me the idea has interesting implications. For example, an arbitrary program could simply be represented by a serialization of `IO ()`. In fact, you could load any program into memory from a file and use Control.Concurrent.forkIO to run it, and later kill it, giving you the beginnings of an operating environment. If such a serialization is architecture independent then to my understanding you have the beginnings of a virtual machine. You could break your program into pieces and store them in a database and load them when needed, or even pull updates to each piece individually from down the web etc, enabling interesting methods of software distribution. It would make very cool stuff possible.

I have had a look at hs-plugins, but it is unclear how to derive a simple pair of functions `(a -> b) -> ByteString` and `ByteString -> Either ParseError (a -> b)`, for example, from the functionality it provides, if it is possible at all. I guess such a thing requires thorough digging into the depths of GHC, (or maybe even LLVM if an architecture independent representation is sought, but I don't know enough to say.). Perhaps this is more a question for those interested and knowledgable in Haskell compilation (and, to some extent, decompilation).

If not Haskell, are there any languages which provide a simple serialization and deserialization of functions?

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Stephen Tetley

9:24 a.m.

...

If not Haskell, are there any languages which provide a simple serialization and deserialization of functions?

Napier88 was a persistent language that also had higher-order functions. I've no experience other than reading about it but as its persistence was "orthogonal persistence" I'd expect HOFs to be persistent. The implementation of Napier88 produced a substantial runtime / persistent store that was used for other languages - I think one was Persistent Haskell, certainly one was Staple which was a higher order language. Tycoon2 was a similar persistent language - it was heavily influenced by ML so potentially it had HOFs. PolyML has a persistent store, though this may have been just for the top-level to freeze bindings I've no idea whether it supported serializing HOFs. Clean supports serialized HOFs as does Oz, see the paper below. Kali Scheme supported migration of running code between networked computers - as it was a Scheme I'd expect it to be higher order (the migration would mandate serialization). http://www-systems.cs.st-andrews.ac.uk/wiki/Napier88 http://www.polyml.org/FAQ.html http://www.st.cs.ru.nl/papers/2003/verm2003-LazyDynamicIO.pdf

5351

Age (days ago)

5353

Last active (days ago)

List overview

Download

25 comments

14 participants

participants (14)

Alberto G. Corona
Alexander Solla
C. McCann
Dan Doel
Gábor Lehel
Jesse Schalken
Ketil Malde
Luke Palmer
Malcolm Wallace
Max Rabkin
roconnor＠theorem.ca
Sjoerd Visscher
Stephen Tetley
Thomas Davie