
anyways 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library. If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something! But I really think insisting that the linker symbol names denote the "datum agreement" in a distributed system is punting on what should be handled at the application level. Simon Marlow put some improvements into GHC to help improve doing dynamic code (un)loading, stress test that! Theres quite a few industrial haskell shops that provide products / services where internally they do runtime dynamic loading of user provided object files, so i'm sure that the core GHC support is there if you actually dig into the apis! And they do this in a distributed systems context, sans CH. 2) I've a work in progress on specing out a proper (and sound :) ) static values type extension for ghc, that will be usable perhaps in your your case (though by dint of being sound, will preclude some of the things you think you want). BUT, any type system changes need to actually provide safety. My motivation for having a notion of static values comes from a desire to add compiler support for certain numerical computing operations that require compiler support to be usable in haskell. BUT, much of the same work @tim: what on earth does "sending arbitrary code" mean? I feel like the more precise thing everyone here wants is "for a given application / infrastructure deployment, I would to be able to send my application specific computations over the network, using cloud haskell, and be sure that both sides think its the same code". As for *how* to send an AST fragment, edward kmett and other have some pretty nice typed AST models that are easy to adapt and extend for an application specific use case. Bound http://hackage.haskell.org/package/bound is one nice one. heres a really really good school of haskell exposition https://www.fpcomplete.com/user/edwardk/bound And theres a generalization that supports strong typing that i've copied from an hpaste https://gist.github.com/cartazio/5727196, where its notable that the AST data type is called "Remote" :), I think thats a hint its meant to be a haskell manipulable way of constructing a typed DSL you can serialize using a finally tagless style api approach (ie have a set of type class instances / operations that you use to run the computation and/or construct the AST you can send over the wire) On Fri, Jan 24, 2014 at 3:19 PM, Mathieu Boespflug <0xbadcode@gmail.com>wrote:
[Sorry for the multiple reposts - couldn't quite figure out which email address doesn't get refused by the list..]
Hi Carter,
thank you for the good points you raise. I'll try and address each of them as best I can below.
0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
Indeed, this could be done without touching the compiler at all. We thought long and hard about a path that would ultimately make an extension either unnecessary, or at any rate very small. At this point, the only thing that we are proposing to add to the compiler is the syntactic form "static e". Contrary to the presentation in the paper, the 'unstatic' function can be implemented entirely as library code and does not need to be a primop. Moreover, we do not need to piece together any kind of global remote table at compile time or link time, because we're piggy backing on that already constructed by the system linker.
The `static e` form could as well be a piece of Template Haskell, but making it a proper extension means that the compiler can enforce more invariants and be a bit more helpful to the user. In particular, detecting situations where symbolic references cannot be generated because e.g. the imported packages were not compiled as dynamic linked libraries. Or seamlessly supporting calling `static f` on an idenfier `f` that is not exported by the module.
1) what does this accomplish that can not be accomplished by having various nodes agree on a DSL, and sending ASTs to each other? 1a) in fact, I'd argue (and some others agree, and i'll admit my opinions have been shaped by those more expert than me) that the sending a wee AST you can interpret on the other side is much SAFER than "sending a function symbol thats hard coded hopefully into both programs in a way that it means the same thing".
I very much subscribe to the idea of defining small DSL's for exchanging code between nodes. And this proposal is compatible with that idea.
One thing that might not have been so clear in the original email is that we are proposing here to introduce just *one such DSL*. It's just that it's a trivial one whose grammar only contains linker symbol names.
As it happens, distributed-static today already supports two such DSL's: a DSL of labels, which are arbitrary string names for functions, and a small language for composing Static values together. There is a patch lying around by Edsko proposing to add a third "DSL": one that allows nodes to trade arbitrary Haskell strings that are then eval'ed on the other end by the 'plugins' package.
As Facundo explains at the end of his email, the notion of a "static" value ought to be a more general one than was first envisioned in the paper: a static value is any closed denotation, denoted in any of a choice of multiple small languages, some of which ship standard with distributed-static. The user can define his own DSL for shipping code around.
This is why we propose to make Static into a class. Each DSL is generated by one datatype. Each such datatype has a Static instance. If you would like to ship an AST around the cluster, you can make the datatype for that AST an instance of Static, with 'unstatic' being defined as an interpreter for your AST.
Concretely:
data HsExpr = ...
instance Static HsExpr where unstatic e = Hs.interpret e
I've had many educational conversations with
... ?
2) how does it provide more type safety than the current TH based approach? (I've seen Tim and others hit very very gnarly bugs in cloud haskell based upon the "magic static values" approach).
The type safety of the current TH approach is reasonable I think. One potential problem comes from managing dynamically typed values in the remote table, which must be coerced to the right type and use the right decoders if you don't use TH. With the approach we propose, there is no remote table, so I guess this should help eliminate a source of bugs.
3) this proposal requires changes to linking etc that would really make it useful only on systems and deployments that only have Template Haskell AND Dynamic linking. (and also rules out any context where it'd be nice to deploy a static app or say, use CH in ios! )
I don't know about iOS. And it's very likely that there are contexts in which this extension doesn't work. But as I said above, you are always free to define your own DSL's that cover the particular use case that you have in mind. The nice thing with this particular DSL is that it requires little to no TH to generate label names, which can always be a source of bugs, especially when you forget to include them in the global remote table (which is something that TH doesn't and can't help you with).
Furthermore, it was my understanding that GHC is heading towards a world of "dynamic linkable by default", and it is by now something that is supported on most platforms by GHC. See e.g.
https://ghc.haskell.org/trac/ghc/wiki/DynamicGhcPrograms
There are fairly good solutions to deploy self contained dynamically linked apps these days, e.g. Docker. And in any case, with a few extra flags we can still do away with the dynamic linking requirement on some (all?) platforms.
to repeat: have you considered defining an AST type + interpreter for the computations you want to send around, and doing that? I think its a much simpler, safer, easier, flexible and PORTABLE approach, though one current CH doesn't do (though the folks working on CH seem to be receptive to switching to such a strategy if someone validates it)
We have, and it's an option with different tradeoffs. Both solutions could gainfully live side by side and are in fact complementary. I contend that the solution described by Facundo has the advantage of eliminating much of the syntactic overhead associated with sending references to (higher-order) values across the cluster. We have more ideas specific to distributed-process which we can discuss in a separate thread to reduce the syntactic overhead even further, to practically nothing.
Best,
Mathieu