anyways

1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library.
 If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!  

But I really think insisting that the linker symbol names denote the "datum agreement" in a distributed system is punting on what should be handled at the application level. Simon Marlow put some improvements into GHC to help improve doing dynamic code (un)loading, stress test that!

 Theres quite a few industrial haskell shops that provide products / services where internally they do runtime dynamic loading of user provided object files, so i'm sure that the core GHC support is there if you actually dig into the apis! And they do this in a distributed systems context, sans CH.

2) I've a work in progress on specing out a proper (and sound :) ) static values type extension for ghc, that will be usable perhaps in your your case (though by dint of being sound, will preclude some of the things you think you want). BUT, any type system changes need to actually provide safety. My motivation for having a notion of static values comes from a desire to add compiler support for certain numerical computing operations that require compiler support to be usable in haskell. BUT, much of the same work 

@tim: what on earth does "sending arbitrary code" mean? I feel like the more precise thing everyone here wants is "for a given application / infrastructure deployment, I would to be able to send my application specific computations over the network, using cloud haskell, and be sure that both sides think its the same code".

As for *how* to send an AST fragment, edward kmett and other have some pretty nice typed AST models that are easy to adapt and extend for an application specific use case. Bound http://hackage.haskell.org/package/bound is one nice one. 

heres a really really good school of haskell exposition https://www.fpcomplete.com/user/edwardk/bound

And theres a generalization that supports strong typing that i've copied from an hpaste https://gist.github.com/cartazio/5727196, where its notable that the AST data type is called "Remote" :),
I think thats a hint its meant to be a haskell manipulable way of constructing a typed DSL you can serialize using a finally tagless style api approach (ie have a set of type class instances / operations that you use to run the computation and/or construct the AST you can send over the wire)




On Fri, Jan 24, 2014 at 3:19 PM, Mathieu Boespflug <0xbadcode@gmail.com> wrote:
[Sorry for the multiple reposts - couldn't quite figure out which
email address doesn't get refused by the list..]


Hi Carter,

thank you for the good points you raise. I'll try and address each of
them as best I can below.

> 0) I think you could actually implement this proposal as a userland library,
> at least as you've described it. Have you tried doing so?

Indeed, this could be done without touching the compiler at all. We
thought long and hard about a path that would ultimately make an
extension either unnecessary, or at any rate very small. At this
point, the only thing that we are proposing to add to the compiler is
the syntactic form "static e". Contrary to the presentation in the
paper, the 'unstatic' function can be implemented entirely as library
code and does not need to be a primop. Moreover, we do not need to
piece together any kind of global remote table at compile time or link
time, because we're piggy backing on that already constructed by the
system linker.

The `static e` form could as well be a piece of Template Haskell, but
making it a proper extension means that the compiler can enforce more
invariants and be a bit more helpful to the user. In particular,
detecting situations where symbolic references cannot be generated
because e.g. the imported packages were not compiled as dynamic linked
libraries. Or seamlessly supporting calling `static f` on an idenfier
`f` that is not exported by the module.

> 1) what does this accomplish that can not be accomplished by having various
> nodes agree on a DSL, and sending ASTs to each other?
>      1a) in fact, I'd argue (and some others agree, and i'll admit my
> opinions have been shaped by those more expert than me) that the sending a
> wee AST you can interpret on the other side is much SAFER than "sending a
> function symbol thats hard coded hopefully into both programs in a way that
> it means the same thing".

I very much subscribe to the idea of defining small DSL's for
exchanging code between nodes. And this proposal is compatible with
that idea.

One thing that might not have been so clear in the original email is
that we are proposing here to introduce just *one such DSL*. It's just
that it's a trivial one whose grammar only contains linker symbol
names.

As it happens, distributed-static today already supports two such
DSL's: a DSL of labels, which are arbitrary string names for
functions, and a small language for composing Static values together.
There is a patch lying around by Edsko proposing to add a third "DSL":
one that allows nodes to trade arbitrary Haskell strings that are then
eval'ed on the other end by the 'plugins' package.

As Facundo explains at the end of his email, the notion of a "static"
value ought to be a more general one than was first envisioned in the
paper: a static value is any closed denotation, denoted in any of a
choice of multiple small languages, some of which ship standard with
distributed-static. The user can define his own DSL for shipping code
around.

This is why we propose to make Static into a class. Each DSL is
generated by one datatype. Each such datatype has a Static instance.
If you would like to ship an AST around the cluster, you can make the
datatype for that AST an instance of Static, with 'unstatic' being
defined as an interpreter for your AST.

Concretely:

data HsExpr = ...

instance Static HsExpr where
  unstatic e = Hs.interpret e

> I've had many educational conversations with

... ?

> 2) how does it provide more type safety than the current TH based approach?
> (I've seen Tim and others hit very very gnarly bugs in cloud haskell based
> upon the "magic static values" approach).

The type safety of the current TH approach is reasonable I think. One
potential problem comes from managing dynamically typed values in the
remote table, which must be coerced to the right type and use the
right decoders if you don't use TH. With the approach we propose,
there is no remote table, so I guess this should help eliminate a
source of bugs.

> 3) this proposal requires changes to linking etc that would really make it
> useful only on systems and deployments that only have Template Haskell AND
> Dynamic linking.  (and also rules out any context where it'd be nice to
> deploy a static app or say, use CH in ios! )

I don't know about iOS. And it's very likely that there are contexts
in which this extension doesn't work. But as I said above, you are
always free to define your own DSL's that cover the particular use
case that you have in mind. The nice thing with this particular DSL is
that it requires little to no TH to generate label names, which can
always be a source of bugs, especially when you forget to include them
in the global remote table (which is something that TH doesn't and
can't help you with).

Furthermore, it was my understanding that GHC is heading towards a
world of "dynamic linkable by default", and it is by now something
that is supported on most platforms by GHC. See e.g.

https://ghc.haskell.org/trac/ghc/wiki/DynamicGhcPrograms

There are fairly good solutions to deploy self contained dynamically
linked apps these days, e.g. Docker. And in any case, with a few extra
flags we can still do away with the dynamic linking requirement on
some (all?) platforms.

> to repeat: have you considered defining an AST type + interpreter for the
> computations you want to send around, and doing that? I think its a much
> simpler, safer, easier, flexible and PORTABLE approach, though one current
> CH doesn't do (though the folks working on CH seem to be receptive to
> switching to such a strategy if someone validates it)

We have, and it's an option with different tradeoffs. Both solutions
could gainfully live side by side and are in fact complementary. I
contend that the solution described by Facundo has the advantage of
eliminating much of the syntactic overhead associated with sending
references to (higher-order) values across the cluster. We have more
ideas specific to distributed-process which we can discuss in a
separate thread to reduce the syntactic overhead even further, to
practically nothing.

Best,

Mathieu