Static values language extension proposal

Hello, With the support of Tweag I/O, Mathieu and I have been assembling a design proposal for the language extension for Static values that will take Cloud Haskell a big step forward in usability. Please, find the proposal inlined below. We are looking forward to discuss its feasibility and features with the community. Best, Facundo -- In these notes we discuss a design of the language extension proposed in [1] for Cloud Haskell. That is, support from the compiler to produce labels that can be used to identify Haskell top-level bindings across processes in a network. Static values ========= Following [1], the extension consists of a new syntactic form `static e`, along with a type constructor `StaticRef` and a function unstatic :: StaticRef a -> a The idea is that values of type `StaticRef a` uniquely identify a value that can be referred to by a global name rather than serialized over the network between processes that are instances of a single binary, because all such processes share the same top-level bindings. Generating static references ==================== We start by introducing global names. A `GlobalName` is a symbol bound in the top-level environment. It is much like global names in Template Haskell, but `GlobalNames` always refer to terms, and they include a package version. data GlobalName = GlobalName PkgName PkgVersion ModName OccName `GlobalNames` can be used as references to static values. newtype StaticRef a = StaticRef GlobalName `StaticRef a` is to `GlobalName` what `Ptr a` is to `Addr#`: a wrapper with a phantom type parameter that keeps track of the type of the value that is referenced. The special form static e is an expression of type `StaticRef a` where `e :: a` is a closed expression (meaning any free variables in `e` are bound in the top-level environment). If `e` is an identifier, `static e` just refers to it. Otherwise, the compiler needs to introduce a new top-level binding with a fresh name and the expression used as right-hand side, and the static reference would point to this top-level binding instead. Looking up static references ==================== `unstatic` is implemented as a function which finds a top-level value from the `GlobalName`, otherwise it raises an exception. It crucially relies on leveraging the system’s dynamic linker, so out-of-the-box only works with dynamically linked binaries (but see below). `unstatic` proceeds as follows: * Determines the name of the shared library from the package name and the package version. * Determines the symbol of the value by Z-Encoding the package name, the module name and the value name. * Uses the system’s dynamic linker interface to obtain the address of the symbol. * Converts the symbol to a haskell value with `GHC.Prim.addrToAny#` In principle, only symbols in shared libraries can be found. However, the dynamic linker is able to find symbols in modules that are linked statically if GHC is fed with the option -optl-Wl,--export-dynamic. A future enhancement could be to have GHC warn the user when modules using the extension are linked statically and this option is not used during linking. GHC only defines symbols for exported definitions in modules. So unstatic won’t be able to find the private bindings of a module. For this sake, the implementation of static should in addition ensure that the bindings it gets will appear in the symbol table when they are not exported by their defining modules. Template Haskell support ================== The static keyword needs to be made available in Template Haskell so the distributed-static package can benefit from this language extension. Rationale ======= We want the language extension to meet the following requirements: 1. It must be a practical alternative to the remoteTable functions in the distributed-static package. 2. It must not change the build scheme used for Haskell programs. A collection of .o files produced from Haskell source code should still be possible to link with the system linking tools. 3. It must not restrict all communicating processes using the extension to be launched from the same binary. 4. It must not significantly increase the binary size. (1) is addressed by replacing remote tables with the symbol tables produced by the compiler. Additionally, Template Haskell support is included so that the existing distributed-static package can be adapted and extended to include this extension. (2) is addressed by choosing a scheme which does not require the linker to perform any extension-specific procedure to collect the static values in various modules. There’s a trade off here though, since symbols in statically linked modules cannot be accessed unless -optl-Wl,--export-dynamic is supplied during linking. (3) is addressed by allowing programs to exchange static values for any bindings found in the modules they share. (4) is addressed by reusing the symbol tables produced by the compiler in object files rather than creating separate remote tables. About the need for using different binaries ============================== While using distributed-process we found some use cases for supporting communicating closures between multiple binaries. One of these use cases involved a distributed application and a monitoring tool. The monitoring tool would need to link in some graphics libraries to display information on the screen, none of which were required by the monitored application. Conversely, the monitored application would link in some modules that the monitoring application didn’t need. Crucially, both applications are fairly loosely coupled, even if they both need to exchange static values about bindings in some modules they shared. An analogous use case involved the distributed application and a control application that would be used to change dynamic settings of the former. Further Work ========== As the application depends on shared libraries, now a tool to collect these libraries would be required so they can be distributed together with the executable binary when deploying a Cloud Haskell application in a cluster. We won’t delve further into this problem. Another possible line of work is extending this approach so a process can pull shared objects from a remote peer, when this remote peer sends a static value that is defined in a shared object not available to the process. Integration with distributed-static ======================= The package distributed-static could either adopt this extension as the only implementation of static values, or it could support many notions of static references, say by using a type class to overload `unstatic`. class Static st s | s -> st where unstatic :: st -> s a -> Either String a where the class parameter `st` is provided for backwards compatibility with the existing scheme to provide context-dependent information. The extension we present here does not depend on this parameter, so `()` could be used for the `StaticRef` instance. instance Static () StaticRef where ... References ======== [1] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011. ISSN 0362-1340.

Hey Facundo, thanks for sharing this proposal. several questions: 0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so? 1) what does this accomplish that can not be accomplished by having various nodes agree on a DSL, and sending ASTs to each other? 1a) in fact, I'd argue (and some others agree, and i'll admit my opinions have been shaped by those more expert than me) that the sending a wee AST you can interpret on the other side is much SAFER than "sending a function symbol thats hard coded hopefully into both programs in a way that it means the same thing". I've had many educational conversations with 2) how does it provide more type safety than the current TH based approach? (I've seen Tim and others hit very very gnarly bugs in cloud haskell based upon the "magic static values" approach). 3) this proposal requires changes to linking etc that would really make it useful only on systems and deployments that only have Template Haskell AND Dynamic linking. (and also rules out any context where it'd be nice to deploy a static app or say, use CH in ios! ) to repeat: have you considered defining an AST type + interpreter for the computations you want to send around, and doing that? I think its a much simpler, safer, easier, flexible and PORTABLE approach, though one current CH doesn't do (though the folks working on CH seem to be receptive to switching to such a strategy if someone validates it) cheers -Carter On Fri, Jan 24, 2014 at 12:19 PM, Facundo Domínguez < facundo.dominguez@tweag.io> wrote:
Hello, With the support of Tweag I/O, Mathieu and I have been assembling a design proposal for the language extension for Static values that will take Cloud Haskell a big step forward in usability. Please, find the proposal inlined below.
We are looking forward to discuss its feasibility and features with the community.
Best, Facundo
--
In these notes we discuss a design of the language extension proposed in [1] for Cloud Haskell. That is, support from the compiler to produce labels that can be used to identify Haskell top-level bindings across processes in a network.
Static values =========
Following [1], the extension consists of a new syntactic form `static e`, along with a type constructor `StaticRef` and a function
unstatic :: StaticRef a -> a
The idea is that values of type `StaticRef a` uniquely identify a value that can be referred to by a global name rather than serialized over the network between processes that are instances of a single binary, because all such processes share the same top-level bindings.
Generating static references ====================
We start by introducing global names. A `GlobalName` is a symbol bound in the top-level environment. It is much like global names in Template Haskell, but `GlobalNames` always refer to terms, and they include a package version.
data GlobalName = GlobalName PkgName PkgVersion ModName OccName
`GlobalNames` can be used as references to static values.
newtype StaticRef a = StaticRef GlobalName
`StaticRef a` is to `GlobalName` what `Ptr a` is to `Addr#`: a wrapper with a phantom type parameter that keeps track of the type of the value that is referenced.
The special form
static e
is an expression of type `StaticRef a` where `e :: a` is a closed expression (meaning any free variables in `e` are bound in the top-level environment).
If `e` is an identifier, `static e` just refers to it. Otherwise, the compiler needs to introduce a new top-level binding with a fresh name and the expression used as right-hand side, and the static reference would point to this top-level binding instead.
Looking up static references ====================
`unstatic` is implemented as a function which finds a top-level value from the `GlobalName`, otherwise it raises an exception. It crucially relies on leveraging the system’s dynamic linker, so out-of-the-box only works with dynamically linked binaries (but see below). `unstatic` proceeds as follows:
* Determines the name of the shared library from the package name and the package version.
* Determines the symbol of the value by Z-Encoding the package name, the module name and the value name.
* Uses the system’s dynamic linker interface to obtain the address of the symbol.
* Converts the symbol to a haskell value with `GHC.Prim.addrToAny#`
In principle, only symbols in shared libraries can be found. However, the dynamic linker is able to find symbols in modules that are linked statically if GHC is fed with the option -optl-Wl,--export-dynamic. A future enhancement could be to have GHC warn the user when modules using the extension are linked statically and this option is not used during linking.
GHC only defines symbols for exported definitions in modules. So unstatic won’t be able to find the private bindings of a module. For this sake, the implementation of static should in addition ensure that the bindings it gets will appear in the symbol table when they are not exported by their defining modules.
Template Haskell support ==================
The static keyword needs to be made available in Template Haskell so the distributed-static package can benefit from this language extension.
Rationale =======
We want the language extension to meet the following requirements:
1. It must be a practical alternative to the remoteTable functions in the distributed-static package.
2. It must not change the build scheme used for Haskell programs. A collection of .o files produced from Haskell source code should still be possible to link with the system linking tools.
3. It must not restrict all communicating processes using the extension to be launched from the same binary.
4. It must not significantly increase the binary size.
(1) is addressed by replacing remote tables with the symbol tables produced by the compiler. Additionally, Template Haskell support is included so that the existing distributed-static package can be adapted and extended to include this extension.
(2) is addressed by choosing a scheme which does not require the linker to perform any extension-specific procedure to collect the static values in various modules. There’s a trade off here though, since symbols in statically linked modules cannot be accessed unless -optl-Wl,--export-dynamic is supplied during linking.
(3) is addressed by allowing programs to exchange static values for any bindings found in the modules they share.
(4) is addressed by reusing the symbol tables produced by the compiler in object files rather than creating separate remote tables.
About the need for using different binaries ==============================
While using distributed-process we found some use cases for supporting communicating closures between multiple binaries.
One of these use cases involved a distributed application and a monitoring tool. The monitoring tool would need to link in some graphics libraries to display information on the screen, none of which were required by the monitored application. Conversely, the monitored application would link in some modules that the monitoring application didn’t need. Crucially, both applications are fairly loosely coupled, even if they both need to exchange static values about bindings in some modules they shared.
An analogous use case involved the distributed application and a control application that would be used to change dynamic settings of the former.
Further Work ==========
As the application depends on shared libraries, now a tool to collect these libraries would be required so they can be distributed together with the executable binary when deploying a Cloud Haskell application in a cluster. We won’t delve further into this problem.
Another possible line of work is extending this approach so a process can pull shared objects from a remote peer, when this remote peer sends a static value that is defined in a shared object not available to the process.
Integration with distributed-static =======================
The package distributed-static could either adopt this extension as the only implementation of static values, or it could support many notions of static references, say by using a type class to overload `unstatic`.
class Static st s | s -> st where unstatic :: st -> s a -> Either String a
where the class parameter `st` is provided for backwards compatibility with the existing scheme to provide context-dependent information. The extension we present here does not depend on this parameter, so `()` could be used for the `StaticRef` instance.
instance Static () StaticRef where ...
References ========
[1] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011. ISSN 0362-1340.
-- You received this message because you are subscribed to the Google Groups "parallel-haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to parallel-haskell+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

On 24 Jan 2014, at 17:59, Carter Schonwald wrote:
0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
I didn't pick up on that at all - how would we be able to do that?
1) what does this accomplish that can not be accomplished by having various nodes agree on a DSL, and sending ASTs to each other? 1a) in fact, I'd argue (and some others agree, and i'll admit my opinions have been shaped by those more expert than me) that the sending a wee AST you can interpret on the other side is much SAFER than "sending a function symbol thats hard coded hopefully into both programs in a way that it means the same thing". I've had many educational conversations with
I've still not seen a convincing example of how to do this though. It would help if someone explained what this would look like, running over two (or more) separate binaries and still shipping code. It's just that, afaict, that AST wouldn't be so "wee" once it had to represent any arbitrary expression. One could, of course, just ship source (or some intermediate representation), but that would also require compiler infrastructure to be installed on the target.
2) how does it provide more type safety than the current TH based approach? (I've seen Tim and others hit very very gnarly bugs in cloud haskell based upon the "magic static values" approach).
This is definitely true, but I see it as a problem related to our use of TH rather than our current use of closures and 'Static' per se. Having said that, it can be toe-curlingly difficult to work with closure/static sometimes, so *anything* that makes this easier sounds good to me.
to repeat: have you considered defining an AST type + interpreter for the computations you want to send around, and doing that? I think its a much simpler, safer, easier, flexible and PORTABLE approach, though one current CH doesn't do (though the folks working on CH seem to be receptive to switching to such a strategy if someone validates it)
I/we are, I think, amenable to doing whatever makes the most sense. This could include doing more than one thing, when it comes to dealing with 'statics'. Personally I think the proposal sounds interesting, though as I mentioned in my previously mail, I haven't had time to sit down and look at it in detail yet. Cheers, Tim

[Sorry for the multiple reposts - couldn't quite figure out which email address doesn't get refused by the list..] Hi Carter, thank you for the good points you raise. I'll try and address each of them as best I can below.
0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
Indeed, this could be done without touching the compiler at all. We thought long and hard about a path that would ultimately make an extension either unnecessary, or at any rate very small. At this point, the only thing that we are proposing to add to the compiler is the syntactic form "static e". Contrary to the presentation in the paper, the 'unstatic' function can be implemented entirely as library code and does not need to be a primop. Moreover, we do not need to piece together any kind of global remote table at compile time or link time, because we're piggy backing on that already constructed by the system linker. The `static e` form could as well be a piece of Template Haskell, but making it a proper extension means that the compiler can enforce more invariants and be a bit more helpful to the user. In particular, detecting situations where symbolic references cannot be generated because e.g. the imported packages were not compiled as dynamic linked libraries. Or seamlessly supporting calling `static f` on an idenfier `f` that is not exported by the module.
1) what does this accomplish that can not be accomplished by having various nodes agree on a DSL, and sending ASTs to each other? 1a) in fact, I'd argue (and some others agree, and i'll admit my opinions have been shaped by those more expert than me) that the sending a wee AST you can interpret on the other side is much SAFER than "sending a function symbol thats hard coded hopefully into both programs in a way that it means the same thing".
I very much subscribe to the idea of defining small DSL's for exchanging code between nodes. And this proposal is compatible with that idea. One thing that might not have been so clear in the original email is that we are proposing here to introduce just *one such DSL*. It's just that it's a trivial one whose grammar only contains linker symbol names. As it happens, distributed-static today already supports two such DSL's: a DSL of labels, which are arbitrary string names for functions, and a small language for composing Static values together. There is a patch lying around by Edsko proposing to add a third "DSL": one that allows nodes to trade arbitrary Haskell strings that are then eval'ed on the other end by the 'plugins' package. As Facundo explains at the end of his email, the notion of a "static" value ought to be a more general one than was first envisioned in the paper: a static value is any closed denotation, denoted in any of a choice of multiple small languages, some of which ship standard with distributed-static. The user can define his own DSL for shipping code around. This is why we propose to make Static into a class. Each DSL is generated by one datatype. Each such datatype has a Static instance. If you would like to ship an AST around the cluster, you can make the datatype for that AST an instance of Static, with 'unstatic' being defined as an interpreter for your AST. Concretely: data HsExpr = ... instance Static HsExpr where unstatic e = Hs.interpret e
I've had many educational conversations with
... ?
2) how does it provide more type safety than the current TH based approach? (I've seen Tim and others hit very very gnarly bugs in cloud haskell based upon the "magic static values" approach).
The type safety of the current TH approach is reasonable I think. One potential problem comes from managing dynamically typed values in the remote table, which must be coerced to the right type and use the right decoders if you don't use TH. With the approach we propose, there is no remote table, so I guess this should help eliminate a source of bugs.
3) this proposal requires changes to linking etc that would really make it useful only on systems and deployments that only have Template Haskell AND Dynamic linking. (and also rules out any context where it'd be nice to deploy a static app or say, use CH in ios! )
I don't know about iOS. And it's very likely that there are contexts in which this extension doesn't work. But as I said above, you are always free to define your own DSL's that cover the particular use case that you have in mind. The nice thing with this particular DSL is that it requires little to no TH to generate label names, which can always be a source of bugs, especially when you forget to include them in the global remote table (which is something that TH doesn't and can't help you with). Furthermore, it was my understanding that GHC is heading towards a world of "dynamic linkable by default", and it is by now something that is supported on most platforms by GHC. See e.g. https://ghc.haskell.org/trac/ghc/wiki/DynamicGhcPrograms There are fairly good solutions to deploy self contained dynamically linked apps these days, e.g. Docker. And in any case, with a few extra flags we can still do away with the dynamic linking requirement on some (all?) platforms.
to repeat: have you considered defining an AST type + interpreter for the computations you want to send around, and doing that? I think its a much simpler, safer, easier, flexible and PORTABLE approach, though one current CH doesn't do (though the folks working on CH seem to be receptive to switching to such a strategy if someone validates it)
We have, and it's an option with different tradeoffs. Both solutions could gainfully live side by side and are in fact complementary. I contend that the solution described by Facundo has the advantage of eliminating much of the syntactic overhead associated with sending references to (higher-order) values across the cluster. We have more ideas specific to distributed-process which we can discuss in a separate thread to reduce the syntactic overhead even further, to practically nothing. Best, Mathieu

anyways 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library. If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something! But I really think insisting that the linker symbol names denote the "datum agreement" in a distributed system is punting on what should be handled at the application level. Simon Marlow put some improvements into GHC to help improve doing dynamic code (un)loading, stress test that! Theres quite a few industrial haskell shops that provide products / services where internally they do runtime dynamic loading of user provided object files, so i'm sure that the core GHC support is there if you actually dig into the apis! And they do this in a distributed systems context, sans CH. 2) I've a work in progress on specing out a proper (and sound :) ) static values type extension for ghc, that will be usable perhaps in your your case (though by dint of being sound, will preclude some of the things you think you want). BUT, any type system changes need to actually provide safety. My motivation for having a notion of static values comes from a desire to add compiler support for certain numerical computing operations that require compiler support to be usable in haskell. BUT, much of the same work @tim: what on earth does "sending arbitrary code" mean? I feel like the more precise thing everyone here wants is "for a given application / infrastructure deployment, I would to be able to send my application specific computations over the network, using cloud haskell, and be sure that both sides think its the same code". As for *how* to send an AST fragment, edward kmett and other have some pretty nice typed AST models that are easy to adapt and extend for an application specific use case. Bound http://hackage.haskell.org/package/bound is one nice one. heres a really really good school of haskell exposition https://www.fpcomplete.com/user/edwardk/bound And theres a generalization that supports strong typing that i've copied from an hpaste https://gist.github.com/cartazio/5727196, where its notable that the AST data type is called "Remote" :), I think thats a hint its meant to be a haskell manipulable way of constructing a typed DSL you can serialize using a finally tagless style api approach (ie have a set of type class instances / operations that you use to run the computation and/or construct the AST you can send over the wire) On Fri, Jan 24, 2014 at 3:19 PM, Mathieu Boespflug <0xbadcode@gmail.com>wrote:
[Sorry for the multiple reposts - couldn't quite figure out which email address doesn't get refused by the list..]
Hi Carter,
thank you for the good points you raise. I'll try and address each of them as best I can below.
0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
Indeed, this could be done without touching the compiler at all. We thought long and hard about a path that would ultimately make an extension either unnecessary, or at any rate very small. At this point, the only thing that we are proposing to add to the compiler is the syntactic form "static e". Contrary to the presentation in the paper, the 'unstatic' function can be implemented entirely as library code and does not need to be a primop. Moreover, we do not need to piece together any kind of global remote table at compile time or link time, because we're piggy backing on that already constructed by the system linker.
The `static e` form could as well be a piece of Template Haskell, but making it a proper extension means that the compiler can enforce more invariants and be a bit more helpful to the user. In particular, detecting situations where symbolic references cannot be generated because e.g. the imported packages were not compiled as dynamic linked libraries. Or seamlessly supporting calling `static f` on an idenfier `f` that is not exported by the module.
1) what does this accomplish that can not be accomplished by having various nodes agree on a DSL, and sending ASTs to each other? 1a) in fact, I'd argue (and some others agree, and i'll admit my opinions have been shaped by those more expert than me) that the sending a wee AST you can interpret on the other side is much SAFER than "sending a function symbol thats hard coded hopefully into both programs in a way that it means the same thing".
I very much subscribe to the idea of defining small DSL's for exchanging code between nodes. And this proposal is compatible with that idea.
One thing that might not have been so clear in the original email is that we are proposing here to introduce just *one such DSL*. It's just that it's a trivial one whose grammar only contains linker symbol names.
As it happens, distributed-static today already supports two such DSL's: a DSL of labels, which are arbitrary string names for functions, and a small language for composing Static values together. There is a patch lying around by Edsko proposing to add a third "DSL": one that allows nodes to trade arbitrary Haskell strings that are then eval'ed on the other end by the 'plugins' package.
As Facundo explains at the end of his email, the notion of a "static" value ought to be a more general one than was first envisioned in the paper: a static value is any closed denotation, denoted in any of a choice of multiple small languages, some of which ship standard with distributed-static. The user can define his own DSL for shipping code around.
This is why we propose to make Static into a class. Each DSL is generated by one datatype. Each such datatype has a Static instance. If you would like to ship an AST around the cluster, you can make the datatype for that AST an instance of Static, with 'unstatic' being defined as an interpreter for your AST.
Concretely:
data HsExpr = ...
instance Static HsExpr where unstatic e = Hs.interpret e
I've had many educational conversations with
... ?
2) how does it provide more type safety than the current TH based approach? (I've seen Tim and others hit very very gnarly bugs in cloud haskell based upon the "magic static values" approach).
The type safety of the current TH approach is reasonable I think. One potential problem comes from managing dynamically typed values in the remote table, which must be coerced to the right type and use the right decoders if you don't use TH. With the approach we propose, there is no remote table, so I guess this should help eliminate a source of bugs.
3) this proposal requires changes to linking etc that would really make it useful only on systems and deployments that only have Template Haskell AND Dynamic linking. (and also rules out any context where it'd be nice to deploy a static app or say, use CH in ios! )
I don't know about iOS. And it's very likely that there are contexts in which this extension doesn't work. But as I said above, you are always free to define your own DSL's that cover the particular use case that you have in mind. The nice thing with this particular DSL is that it requires little to no TH to generate label names, which can always be a source of bugs, especially when you forget to include them in the global remote table (which is something that TH doesn't and can't help you with).
Furthermore, it was my understanding that GHC is heading towards a world of "dynamic linkable by default", and it is by now something that is supported on most platforms by GHC. See e.g.
https://ghc.haskell.org/trac/ghc/wiki/DynamicGhcPrograms
There are fairly good solutions to deploy self contained dynamically linked apps these days, e.g. Docker. And in any case, with a few extra flags we can still do away with the dynamic linking requirement on some (all?) platforms.
to repeat: have you considered defining an AST type + interpreter for the computations you want to send around, and doing that? I think its a much simpler, safer, easier, flexible and PORTABLE approach, though one current CH doesn't do (though the folks working on CH seem to be receptive to switching to such a strategy if someone validates it)
We have, and it's an option with different tradeoffs. Both solutions could gainfully live side by side and are in fact complementary. I contend that the solution described by Facundo has the advantage of eliminating much of the syntactic overhead associated with sending references to (higher-order) values across the cluster. We have more ideas specific to distributed-process which we can discuss in a separate thread to reduce the syntactic overhead even further, to practically nothing.
Best,
Mathieu

On 25 Jan 2014, at 18:12, Carter Schonwald wrote:
1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library. If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!
Is that something you'll consider looking at Matthieu?
Theres quite a few industrial haskell shops that provide products / services where internally they do runtime dynamic loading of user provided object files, so i'm sure that the core GHC support is there if you actually dig into the apis! And they do this in a distributed systems context, sans CH.
We have a pull request from Edsko that melds hs-plugins support with static, as per the original proposal's notes, so this seems like a corollary issue to me.
2) I've a work in progress on specing out a proper (and sound :) ) static values type extension for ghc, that will be usable perhaps in your your case (though by dint of being sound, will preclude some of the things you think you want). BUT, any type system changes need to actually provide safety. My motivation for having a notion of static values comes from a desire to add compiler support for certain numerical computing operations that require compiler support to be usable in haskell. BUT, much of the same work
Timescales? There are commercial users of Cloud Haskell clamouring for improvements to the way we handle this situation, and I'm keen to combine getting broader community agreements about "the right thing to do" with facilitating our users real needs. If there are other options pertaining to "static" support, I'd like to know more!
@tim: what on earth does "sending arbitrary code" mean? I feel like the more precise thing everyone here wants is "for a given application / infrastructure deployment, I would to be able to send my application specific computations over the network, using cloud haskell, and be sure that both sides think its the same code".
With Cloud Haskell in its current guise, I can "Closure up" pretty any thunk I like and spawn it on a remote node. If the node's are both running the same executable, we're fine. If they're not, we're potentially in trouble. In Erlang, I can rpc/send *any* term and evaluate it on another node. That includes functions of course. Whether or not we want to be quite that general is another matter, but that is the comparison I've been making.
As for *how* to send an AST fragment, edward kmett and other have some pretty nice typed AST models that are easy to adapt and extend for an application specific use case. Bound http://hackage.haskell.org/package/bound is one nice one.
heres a really really good school of haskell exposition https://www.fpcomplete.com/user/edwardk/bound
And theres a generalization that supports strong typing that i've copied from an hpaste https://gist.github.com/cartazio/5727196, where its notable that the AST data type is called "Remote" :), I think thats a hint its meant to be a haskell manipulable way of constructing a typed DSL you can serialize using a finally tagless style api approach (ie have a set of type class instances / operations that you use to run the computation and/or construct the AST you can send over the wire)
These are all lovely, but aren't we talking about either (a) putting together an AST to represent whatever valid Haskell program someone wants to send, or (b) forcing every application developer to write an AST to cover all their remote computations. Both of those sound like a lot more work than the proposal below. They may be the right approach from some domains, but there is a fair bit of "developer overhead" involved from what I can see.
On Fri, Jan 24, 2014 at 3:19 PM, Mathieu Boespflug <0xbadcode@gmail.com> wrote: The `static e` form could as well be a piece of Template Haskell, but making it a proper extension means that the compiler can enforce more invariants and be a bit more helpful to the user. In particular, detecting situations where symbolic references cannot be generated because e.g. the imported packages were not compiled as dynamic linked libraries. Or seamlessly supporting calling `static f` on an idenfier `f` that is not exported by the module.
All of which sound like a usability improvement to me.
I very much subscribe to the idea of defining small DSL's for exchanging code between nodes. And this proposal is compatible with that idea.
One thing that might not have been so clear in the original email is that we are proposing here to introduce just *one such DSL*. It's just that it's a trivial one whose grammar only contains linker symbol names.
That triviality is a rather important point as well, because...
As it happens, distributed-static today already supports two such DSL's: a DSL of labels, which are arbitrary string names for functions, and a small language for composing Static values together.
And whilst those two DSL's are rather simple, it can still be tricky to get things right.
As Facundo explains at the end of his email, the notion of a "static" value ought to be a more general one than was first envisioned in the paper: a static value is any closed denotation, denoted in any of a choice of multiple small languages, some of which ship standard with distributed-static. The user can define his own DSL for shipping code around.
Indeed - there's never been anything preventing users from doing thus. Indeed, sending messages that are "interpreted" by a remote processes in order to apply some specific processing is pretty much the MO of all Cloud Haskell code. The "plugins" based support will add to the options there.
2) how does it provide more type safety than the current TH based approach? (I've seen Tim and others hit very very gnarly bugs in cloud haskell based upon the "magic static values" approach).
The type safety of the current TH approach is reasonable I think. One potential problem comes from managing dynamically typed values in the remote table, which must be coerced to the right type and use the right decoders if you don't use TH. With the approach we propose, there is no remote table, so I guess this should help eliminate a source of bugs.
And remove a slightly awkward programming model.
to repeat: have you considered defining an AST type + interpreter for the computations you want to send around, and doing that? I think its a much simpler, safer, easier, flexible and PORTABLE approach, though one current CH doesn't do (though the folks working on CH seem to be receptive to switching to such a strategy if someone validates it)
We have, and it's an option with different tradeoffs. Both solutions could gainfully live side by side and are in fact complementary. I contend that the solution described by Facundo has the advantage of eliminating much of the syntactic overhead associated with sending references to (higher-order) values across the cluster. We have more ideas specific to distributed-process which we can discuss in a separate thread to reduce the syntactic overhead even further, to practically nothing.
I agree that the proposal sounds beneficial. It's a good thing that both approaches can live side by side. I'd like to hear more about these other ideas too. I'd also like to hear more from the rest of the community - especially Cloud Haskell users. I know a few others besides Parallel Scientific are using Cloud Haskell in commercial applications - I'd very much like to hear from you all on this proposal too. Cheers, Tim

On Sun, Jan 26, 2014 at 1:43 PM, Tim Watson
In Erlang, I can rpc/send *any* term and evaluate it on another node. That includes functions of course. Whether or not we want to be quite that general is another matter, but that is the comparison I've been making.
Note that Erlang gets away with this through being a virtual machine architecture; BEAM is about as write-once-run-anywhere as it gets, and the platform specifics are abstracted by the BEAM VM interpreter. You just aren't going to accomplish this with a native compiled language, without encoding as a virtual machine yourself (that is, the AST-based mechanisms). Perhaps you should consider fleshing out ghc's current bytecode support to be a full VM? Or perhaps an interesting alternative would be a BEAM backend for ghc. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Hi Brandon, On 26 Jan 2014, at 19:01, Brandon Allbery wrote:
On Sun, Jan 26, 2014 at 1:43 PM, Tim Watson
wrote: In Erlang, I can rpc/send *any* term and evaluate it on another node. That includes functions of course. Whether or not we want to be quite that general is another matter, but that is the comparison I've been making. Note that Erlang gets away with this through being a virtual machine architecture; BEAM is about as write-once-run-anywhere as it gets, and the platform specifics are abstracted by the BEAM VM interpreter. You just aren't going to accomplish this with a native compiled language, without encoding as a virtual machine yourself (that is, the AST-based mechanisms).
Yeah, I do realise this. Of course we're not trying to reproduce the BEAM really, but what we /do/ want is to be able to do is exchange messages between nodes that are not running the same executable. The proposal does appear to address this requirement, at least to some extent. There may be complementary (or better) approaches. I believe Carter is going to provide some additional details viz his work in this area at some point. Anything that reduces the amount of Template Haskell required to work with Cloud Haskell is a "good thing (tm)" IMO. Not that I mind using TH, but the programming model is currently quite awkward from the caller's perspective, since you've got to (a) create a Static/Closure out of potentially complex chunks of code, which often involves creating numerous top level wrapper APIs and (b) fiddle around with the remote-table (both in the code that defines remote-able thunks *and* in the code that starts a node wishing to operate on them. Also note that this problem isn't limited to sending code around the network. Just sending arbitrary *data* between nodes is currently discouraged (though not disallowed) because the receiving program *might* not understand the types you're sending it. This is very restrictive and the proposal does, at the very least, allow us to safely serialise, send and receive types that both programs "know about" by virtue of having been linked to the same library/libraries. But yes - there are certainly constraints and edge cases aplenty here. I'm not entirely sure whether or not we'd need to potentially change the (binary) encoding of raw messages in distributed-process, for example, in response to this change. Currently we serialise a pointer (i.e., the pointer to the fingerprint for the type that's being sent), and I can imagine that not working properly across different nodes running on different architectures etc.
Perhaps you should consider fleshing out ghc's current bytecode support to be a full VM?
After discussing this with Simon M, we concluded there was little point in doing so. The GHC RTS is practically a VM anyway, and there's probably not that much value to be gained by shipping bytecode around. Besides, as you put it, the AST-based mechanisms allow for this anyway (albeit with some coding required on the part of the application developer) and Carter (and others) assure me that the mechanisms required to do this kind of thing already exist. We just need to find the right way to take advantage of them.
Or perhaps an interesting alternative would be a BEAM backend for ghc.
I've talked to a couple of people that want to try this. I'm intrigued, but have other things to focus on. :) Cheers, Tim

To address the concerns about static linking and portability, there is
also the alternative of of using the RTS linker in those platforms
that need it.
In many aspects, neither linker makes a big difference to us. We are
going with the system's dynamic linker mainly because GHC team has
expressed the desire to get rid of the RTS linker.
Using the RTS linker would require addressing some additional
technical issues, none of which appear to be show-stoppers. It would
be just more work.
Best,
Facundo
On Mon, Jan 27, 2014 at 2:20 PM, Tim Watson
Hi Brandon,
On 26 Jan 2014, at 19:01, Brandon Allbery wrote:
On Sun, Jan 26, 2014 at 1:43 PM, Tim Watson
wrote: In Erlang, I can rpc/send *any* term and evaluate it on another node. That includes functions of course. Whether or not we want to be quite that general is another matter, but that is the comparison I've been making.
Note that Erlang gets away with this through being a virtual machine architecture; BEAM is about as write-once-run-anywhere as it gets, and the platform specifics are abstracted by the BEAM VM interpreter. You just aren't going to accomplish this with a native compiled language, without encoding as a virtual machine yourself (that is, the AST-based mechanisms).
Yeah, I do realise this. Of course we're not trying to reproduce the BEAM really, but what we /do/ want is to be able to do is exchange messages between nodes that are not running the same executable. The proposal does appear to address this requirement, at least to some extent. There may be complementary (or better) approaches. I believe Carter is going to provide some additional details viz his work in this area at some point.
Anything that reduces the amount of Template Haskell required to work with Cloud Haskell is a "good thing (tm)" IMO. Not that I mind using TH, but the programming model is currently quite awkward from the caller's perspective, since you've got to (a) create a Static/Closure out of potentially complex chunks of code, which often involves creating numerous top level wrapper APIs and (b) fiddle around with the remote-table (both in the code that defines remote-able thunks *and* in the code that starts a node wishing to operate on them.
Also note that this problem isn't limited to sending code around the network. Just sending arbitrary *data* between nodes is currently discouraged (though not disallowed) because the receiving program *might* not understand the types you're sending it. This is very restrictive and the proposal does, at the very least, allow us to safely serialise, send and receive types that both programs "know about" by virtue of having been linked to the same library/libraries.
But yes - there are certainly constraints and edge cases aplenty here. I'm not entirely sure whether or not we'd need to potentially change the (binary) encoding of raw messages in distributed-process, for example, in response to this change. Currently we serialise a pointer (i.e., the pointer to the fingerprint for the type that's being sent), and I can imagine that not working properly across different nodes running on different architectures etc.
Perhaps you should consider fleshing out ghc's current bytecode support to be a full VM?
After discussing this with Simon M, we concluded there was little point in doing so. The GHC RTS is practically a VM anyway, and there's probably not that much value to be gained by shipping bytecode around. Besides, as you put it, the AST-based mechanisms allow for this anyway (albeit with some coding required on the part of the application developer) and Carter (and others) assure me that the mechanisms required to do this kind of thing already exist. We just need to find the right way to take advantage of them.
Or perhaps an interesting alternative would be a BEAM backend for ghc.
I've talked to a couple of people that want to try this. I'm intrigued, but have other things to focus on. :)
Cheers, Tim
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hi Carter, Tim,
On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
anyways
1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library. If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!
Signals what?
On Sun, Jan 26, 2014 at 7:43 PM, Tim Watson
Is that something you'll consider looking at Matthieu?
We would prefer to do it that way, to be honest. As explained in my previous email, we identified two problems with this approach: 1) User friendliness. It's important for us that Cloud Haskell be pretty much as user friendly and easy to use as Erlang is. a) I don't know that it's possible from Template Haskell to detect and warn the user when dependent modules have not been compiled into dynamic object code or into static code with the right flags. b) It's very convenient in practice to be able to send not just `f` if `f` is a global identifier, but in general `e` where `e` is any closed expression mentioning only global names. That can easily be done by having the compiler float the expression `e` to the top-level and give it a global name. I don't see how to do that in TH in a user friendly way. 2) A technical issue: you ought to be able to send unexported functions across the wire, just as you can pass unexported functions as arguments to higher-order functions. Yet GHC does not create linker symbols for unexported identifiers, so our approach would break down. Worse, I don't think that it's even possible to detect in TH whether an identifier is exported or not, in order to warn the user. One could imagine a compiler flag to force the creation of linker symbols for all toplevel bindings, exported or unexported. But that seems wasteful, and potentially not very user friendly. If the above can be solved, all the better! If not: we don't always want to touch the compiler, but when we do, ideally it should be in an unintrusive way. I contend our proposal fits that criterion. And our cursory implementation efforts seem to confirm that so far.
But I really think insisting that the linker symbol names denote the "datum agreement" in a distributed system is punting on what should be handled at the application level. Simon Marlow put some improvements into GHC to help improve doing dynamic code (un)loading, stress test that!
We could use either the system linker or rts linker. Not sure that it makes any difference at the application level.
2) I've a work in progress on specing out a proper (and sound :) ) static values type extension for ghc, that will be usable perhaps in your your case (though by dint of being sound, will preclude some of the things you think you want).
I look forward to hearing more about that. How is the existing proposal not (type?) sound?
BUT, any type system changes need to actually provide safety.
To be clear, this proposal doesn't touch the type checker in any way.
As for *how* to send an AST fragment, edward kmett and other have some pretty nice typed AST models that are easy to adapt and extend for an application specific use case. Bound http://hackage.haskell.org/package/bound is one nice one.
heres a really really good school of haskell exposition https://www.fpcomplete.com/user/edwardk/bound
These are nice encodings for AST's. But they don't address how to minimize the amount of code to ship around the cluster. If you have no agreement about what functions are commonly available, then the AST needs to include the code for the function you are sending, + any functions it depends, + any of their dependencies, and so on transitively. Tim, perhaps the following also answers some of your questions. This is where the current proposal comes in: if you choose to ship around AST's, you can minimize their size by having them mention shared linker symbol names. Mind, that's already possible today, by means of the global RemoteTable, but it's building that remote table safely, conveniently, in a modular way, and with static checking that no symbols from any of the modules that were linked at build time were missed, that is difficult. By avoiding a RemoteTable entirely, we avoid having to solve that difficult problem. :) Best, -- Mathieu Boespflug Founder at http://tweag.io.

On Tue, Jan 28, 2014 at 7:53 AM, Mathieu Boespflug
On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
wrote: 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library. If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!
Signals what?
That there is a shortcoming in ghc and/or the rts that needs to be addressed. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Theres actually a missing piece of information in this thread: what are the
example computations that are being sent?
My understanding is that erlang has not way to send file handles, shared
variables, Tvars, Mvars, memory mapped binary files, GPU code / memory
pointers , and other fun unportable things between nodes, and I don't
really expect / see how we can hope to sanely do that in haskell!
point in fact, even when restricted to "exactly the same binary, running on
a cluster of homogeneous machines with the exact same hardware, with a
modern linux distro " you hit some gnarly problems doing this for arbitrary
closures! Its for a very simple (and fun) reason: address randomization!
Nathan Howell was actually doing some experimentation with one strategy for
this special case here https://github.com/alphaHeavy/vacuum-tube as a
deeply rts twiddling bit of hackery so you could in fact "serialize
arbitrary closures" between homogeneous machines running the exact same
code (and with address randomization disabled too i think)
on the GHC API front,
http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/DynamicLoadi...
with (and more appropriately
http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html)
should actually give enough basic tooling to make this possible as a
userland library, mind you unload was recently fixed up in HEAD by Simon
Marlow to support the dynamic code loading / unloading use case he has in
facebook. Point being the GHC 7.8 version of the ObjLink api should
actually give enough support tooling to prototype this idea in user land,
and that plus better support for writing "direct haskell code" and getting
out both a local computation and an AST we can serialize would probably be
a good set of primitives for making this feasible in user land. I
The meat of my point is 1) "yes I want this too" but also 2) one thing I
really have come to appreciate about how GHC is engineered is a lot of work
is done to provide the "right" primitives so that really really great tools
can be built in user land. I think That the goal of this proposal can be
accomplished quite nicely with the ObjLink module, unless i'm not
understanding something. In Fact, because in general not every computation
will be properly serializable, you need not even bother with tracking an
explicit symbol table on each side, just try to load it at a given type and
if it fails it wasn't there!
The point being, linkers are a thing, ghc exposes an API for linking, have
you tried that api?
http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
On Tue, Jan 28, 2014 at 10:21 AM, Brandon Allbery
On Tue, Jan 28, 2014 at 7:53 AM, Mathieu Boespflug
wrote: On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
wrote: 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library. If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!
Signals what?
That there is a shortcoming in ghc and/or the rts that needs to be addressed.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Hello Carter,
Thanks for the links. IIUC the ObjLink module contains an interface
to the RTS linker. The points raised by Mathieu in his last email as
(1a), (1b) and (2) still hold.
Here's a use case for (2):
module Communicate(run)
import Control.Distributed.Process
f :: Int -> Int
f = id
runSend :: Process ()
runSend = send someone (static f)
runExpect :: Int -> Process Int
runExpect n = fmap (($ n) . unstatic) expect
If any program tries to use runExpect, it would fail at runtime
because it would fail to find `f`, because `f` is not exported and
therefore a symbol for it would not appear in object files.
The solution that modifies the compiler is superior to all workarounds
we could think of to workaround this problem with a library. Any
suggestions?
Best,
Facundo
On Tue, Jan 28, 2014 at 3:03 PM, Carter Schonwald
Theres actually a missing piece of information in this thread: what are the example computations that are being sent? My understanding is that erlang has not way to send file handles, shared variables, Tvars, Mvars, memory mapped binary files, GPU code / memory pointers , and other fun unportable things between nodes, and I don't really expect / see how we can hope to sanely do that in haskell!
point in fact, even when restricted to "exactly the same binary, running on a cluster of homogeneous machines with the exact same hardware, with a modern linux distro " you hit some gnarly problems doing this for arbitrary closures! Its for a very simple (and fun) reason: address randomization!
Nathan Howell was actually doing some experimentation with one strategy for this special case here https://github.com/alphaHeavy/vacuum-tube as a deeply rts twiddling bit of hackery so you could in fact "serialize arbitrary closures" between homogeneous machines running the exact same code (and with address randomization disabled too i think)
on the GHC API front, http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/DynamicLoadi... along with (and more appropriately http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html ) should actually give enough basic tooling to make this possible as a userland library, mind you unload was recently fixed up in HEAD by Simon Marlow to support the dynamic code loading / unloading use case he has in facebook. Point being the GHC 7.8 version of the ObjLink api should actually give enough support tooling to prototype this idea in user land, and that plus better support for writing "direct haskell code" and getting out both a local computation and an AST we can serialize would probably be a good set of primitives for making this feasible in user land. I
The meat of my point is 1) "yes I want this too" but also 2) one thing I really have come to appreciate about how GHC is engineered is a lot of work is done to provide the "right" primitives so that really really great tools can be built in user land. I think That the goal of this proposal can be accomplished quite nicely with the ObjLink module, unless i'm not understanding something. In Fact, because in general not every computation will be properly serializable, you need not even bother with tracking an explicit symbol table on each side, just try to load it at a given type and if it fails it wasn't there!
The point being, linkers are a thing, ghc exposes an API for linking, have you tried that api? http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
On Tue, Jan 28, 2014 at 10:21 AM, Brandon Allbery
wrote: On Tue, Jan 28, 2014 at 7:53 AM, Mathieu Boespflug
wrote: On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
wrote: 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library. If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!
Signals what?
That there is a shortcoming in ghc and/or the rts that needs to be addressed.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Escuse me, the module export list was meant to be
module Communicate(runExpect, runSend) where
Facundo
On Tue, Jan 28, 2014 at 5:13 PM, Facundo Domínguez
Hello Carter, Thanks for the links. IIUC the ObjLink module contains an interface to the RTS linker. The points raised by Mathieu in his last email as (1a), (1b) and (2) still hold.
Here's a use case for (2):
module Communicate(run)
import Control.Distributed.Process
f :: Int -> Int f = id
runSend :: Process () runSend = send someone (static f)
runExpect :: Int -> Process Int runExpect n = fmap (($ n) . unstatic) expect
If any program tries to use runExpect, it would fail at runtime because it would fail to find `f`, because `f` is not exported and therefore a symbol for it would not appear in object files.
The solution that modifies the compiler is superior to all workarounds we could think of to workaround this problem with a library. Any suggestions?
Best, Facundo
On Tue, Jan 28, 2014 at 3:03 PM, Carter Schonwald
wrote: Theres actually a missing piece of information in this thread: what are the example computations that are being sent? My understanding is that erlang has not way to send file handles, shared variables, Tvars, Mvars, memory mapped binary files, GPU code / memory pointers , and other fun unportable things between nodes, and I don't really expect / see how we can hope to sanely do that in haskell!
point in fact, even when restricted to "exactly the same binary, running on a cluster of homogeneous machines with the exact same hardware, with a modern linux distro " you hit some gnarly problems doing this for arbitrary closures! Its for a very simple (and fun) reason: address randomization!
Nathan Howell was actually doing some experimentation with one strategy for this special case here https://github.com/alphaHeavy/vacuum-tube as a deeply rts twiddling bit of hackery so you could in fact "serialize arbitrary closures" between homogeneous machines running the exact same code (and with address randomization disabled too i think)
on the GHC API front, http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/DynamicLoadi... along with (and more appropriately http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html ) should actually give enough basic tooling to make this possible as a userland library, mind you unload was recently fixed up in HEAD by Simon Marlow to support the dynamic code loading / unloading use case he has in facebook. Point being the GHC 7.8 version of the ObjLink api should actually give enough support tooling to prototype this idea in user land, and that plus better support for writing "direct haskell code" and getting out both a local computation and an AST we can serialize would probably be a good set of primitives for making this feasible in user land. I
The meat of my point is 1) "yes I want this too" but also 2) one thing I really have come to appreciate about how GHC is engineered is a lot of work is done to provide the "right" primitives so that really really great tools can be built in user land. I think That the goal of this proposal can be accomplished quite nicely with the ObjLink module, unless i'm not understanding something. In Fact, because in general not every computation will be properly serializable, you need not even bother with tracking an explicit symbol table on each side, just try to load it at a given type and if it fails it wasn't there!
The point being, linkers are a thing, ghc exposes an API for linking, have you tried that api? http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
On Tue, Jan 28, 2014 at 10:21 AM, Brandon Allbery
wrote: On Tue, Jan 28, 2014 at 7:53 AM, Mathieu Boespflug
wrote: On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
wrote: 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library. If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!
Signals what?
That there is a shortcoming in ghc and/or the rts that needs to be addressed.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On 01/28/2014 06:03 PM, Carter Schonwald wrote:
Theres actually a missing piece of information in this thread: what are the example computations that are being sent? My understanding is that erlang has not way to send file handles, shared variables, Tvars, Mvars, memory mapped binary files, GPU code / memory pointers , and other fun unportable things between nodes, and I don't really expect / see how we can hope to sanely do that in haskell!
[...]"exactly the same binary, running on a cluster of homogeneous machines with the exact same hardware, with a modern linux distro " [...]
Nathan Howell was actually doing some experimentation with one strategy for this special case here https://github.com/alphaHeavy/vacuum-tube as a deeply rts twiddling bit of hackery so you could in fact "serialize arbitrary closures" between homogeneous machines running the exact same code (and with address randomization disabled too i think)
When mentioning Nathan's approach (based on foreign primops), let me point to a more complete, RTS-backed implementation; work done by myself and itself based on a long-standing runtime support for a parallel Haskell on distributed memory systems. The latest instance of this rts-based serialisation was reported in the Haskell-implementors' workshop 2013 ( www.haskell.org/wikiupload/2/28/HIW2013PackingAPI.pdf ); code is on github (https://github.com/jberthold/rts-serialisation) Some technical remarks: -Nathan's prim.op approach is awesome, but it is not easy to get its interplay with garbage collection right. It is on my list to take a look at this code again and see how far we can push the envelope. -About address randomisation: The RTS-based serialisation uses relative locations from a known offset to handle it. A more concerning detail is that CAFs must be reverted rather than discarded during GC (currently they are just retained, not satisfactory for long-running code). -About sending arbitrary closures: indeed it does not make any sense to transfer MVars and IORefs (file handles, StablePtrs, etc). My approach is to solve this dynamically by exception handling. I can imagine that there is a sensible combination of RTS support with a suitable type class framework (Static, for one), but lazy evaluation, especially lazy I/O, complicates matters. / Jost Berthold

Hi Carter,
On Tue, Jan 28, 2014 at 6:03 PM, Carter Schonwald
Theres actually a missing piece of information in this thread: what are the example computations that are being sent?
Quite simply, the same as those considered in the original Cloud Haskell paper, that already advocates the extension that Facundo's first email merely fleshed out a tiny bit. Here's the link once again: "Towards Haskell in the Cloud", Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones (2011). http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote... We are emphatically not considering "arbitrary closures" as you say below, anymore than the original paper does. As such...
My understanding is that erlang has not way to send file handles, shared variables, Tvars, Mvars, memory mapped binary files, GPU code / memory pointers , and other fun unportable things between nodes, and I don't really expect / see how we can hope to sanely do that in haskell!
... the above is completely impossible. The original paper explains why this is so (see Sections 2.3 and 5.1). Here's the gist: 1. you can only send remotely serializable values, i.e. that have an instance of class Serializable. 2. none of the above have a Serializable instance, and are hence not "send"-able. When it comes to sending closures capturing any of the above types of values, the reasoning goes like this: 3. a closure in the sense of CH is a pair of a static value and an environment, 4. a closure can only be sent if it is serializable, 5. a closure is serializable only if its its environment can be serialized, 5. its environment can be serialized only if all free variables of the closure can, 6. none of the above have a Serializable instance, 7. hence any closure capturing file handles, MVars, memory pointers, etc cannot be sent.
point in fact, even when restricted to "exactly the same binary, running on a cluster of homogeneous machines with the exact same hardware, with a modern linux distro " you hit some gnarly problems doing this for arbitrary closures! Its for a very simple (and fun) reason: address randomization!
Which is why neither we nor the original paper considered using addresses as labels for static values. We use linker labels, which are stable.
on the GHC API front, http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/DynamicLoadi... along with (and more appropriately http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html ) should actually give enough basic tooling to make this possible as a userland library, mind you unload was recently fixed up in HEAD by Simon Marlow to support the dynamic code loading / unloading use case he has in facebook. Point being the GHC 7.8 version of the ObjLink api should actually give enough support tooling to prototype this idea in user land, and that plus better support for writing "direct haskell code" and getting out both a local computation and an AST we can serialize would probably be a good set of primitives for making this feasible in user land. I
For the third time: we can of course use any linker API that the system or the compiler happens to provide, so long as it allows resolving linker symbols to Haskell values. The (small) extension under consideration does not replace or add to any existing linker API. It just transparently floats closed expressions to the top-level, makes sure linker symbols will exist at runtime (they currently don't always do), and does some basic sanity checks so the user doesn't lose his. I listed problems labeled 1a), 1b) and 2) in my previous email. You still haven't showed us how to address those in pure TH userland.
In Fact, because in general not every computation will be properly serializable, you need not even bother with tracking an explicit symbol table on each side, just try to load it at a given type and if it fails it wasn't there!
The point being, linkers are a thing, ghc exposes an API for linking, have you tried that api? http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
Yes we have. But I don't see how using it or not using it makes any difference to the user interface of the proposed compiler extension. It's an implementation detail with tradeoffs that Facundo could explain in detail in GHC ticket #8711 if you hadn't rudely closed it as a "duplicate" of some future and unspecified work of yours. Best, Mathieu

Hi Mathieu, On 28 Jan 2014, at 12:53, Mathieu Boespflug wrote:
We would prefer to do it that way, to be honest. As explained in my previous email, we identified two problems with this approach:
1) User friendliness. It's important for us that Cloud Haskell be pretty much as user friendly and easy to use as Erlang is.
Exactly!
a) I don't know that it's possible from Template Haskell to detect and warn the user when dependent modules have not been compiled into dynamic object code or into static code with the right flags.
I don't think that it is, from what I've seen, though I'm by no means an expert.
b) It's very convenient in practice to be able to send not just `f` if `f` is a global identifier, but in general `e` where `e` is any closed expression mentioning only global names. That can easily be done by having the compiler float the expression `e` to the top-level and give it a global name. I don't see how to do that in TH in a user friendly way.
Agreed.
2) A technical issue: you ought to be able to send unexported functions across the wire, just as you can pass unexported functions as arguments to higher-order functions. Yet GHC does not create linker symbols for unexported identifiers, so our approach would break down. Worse, I don't think that it's even possible to detect in TH whether an identifier is exported or not, in order to warn the user. One could imagine a compiler flag to force the creation of linker symbols for all toplevel bindings, exported or unexported. But that seems wasteful, and potentially not very user friendly.
Interesting.
If the above can be solved, all the better!
If not: we don't always want to touch the compiler, but when we do, ideally it should be in an unintrusive way. I contend our proposal fits that criterion. And our cursory implementation efforts seem to confirm that so far.
Good!
But I really think insisting that the linker symbol names denote the "datum agreement" in a distributed system is punting on what should be handled at the application level. Simon Marlow put some improvements into GHC to help improve doing dynamic code (un)loading, stress test that!
We could use either the system linker or rts linker. Not sure that it makes any difference at the application level.
No indeed.
2) I've a work in progress on specing out a proper (and sound :) ) static values type extension for ghc, that will be usable perhaps in your your case (though by dint of being sound, will preclude some of the things you think you want).
I look forward to hearing more about that.
+1
How is the existing proposal not (type?) sound?
I'd like to hear more about the concerns too.
As for *how* to send an AST fragment, edward kmett and other have some pretty nice typed AST models that are easy to adapt and extend for an application specific use case. Bound http://hackage.haskell.org/package/bound is one nice one.
heres a really really good school of haskell exposition https://www.fpcomplete.com/user/edwardk/bound
These are nice encodings for AST's. But they don't address how to minimize the amount of code to ship around the cluster. If you have no agreement about what functions are commonly available, then the AST needs to include the code for the function you are sending, + any functions it depends, + any of their dependencies, and so on transitively.
That was precisely my concern with the idea of shipping *something* AST-like around. It's a lot of overhead for every application you want to develop, or a *massive* overhead to cover all bases.
Tim, perhaps the following also answers some of your questions. This is where the current proposal comes in: if you choose to ship around AST's, you can minimize their size by having them mention shared linker symbol names.
Indeed, that does seem to simplify things.
Mind, that's already possible today, by means of the global RemoteTable, but it's building that remote table safely, conveniently, in a modular way, and with static checking that no symbols from any of the modules that were linked at build time were missed, that is difficult.
Yep. It's awkward and when you get it wrong, you're either fighting with TH-obscured compiler errors or worse, the damn thing just doesn't work (because you can't decode properly on the remote node and things just crash, or worse still, just hang on waiting for the *correct* input types, which never arrive because they're not "known" to the RTS).
By avoiding a RemoteTable entirely, we avoid having to solve that difficult problem. :)
Not having a RemoteTable sounds like a plus to me. Cheers, Tim

Mathieu Boespflug wrote:
[Sorry for the multiple reposts - couldn't quite figure out which email address doesn't get refused by the list..]
Hi Carter,
thank you for the good points you raise. I'll try and address each of them as best I can below.
0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
Indeed, this could be done without touching the compiler at all.
We had this response really early on in this discussion. Quite honestly I think that should have been the end of the discussion. The existing GHC release already have a huge workload getting releases out the door and adding to that workload without adding manpower and resources would be a bad idea. You really should try doing this as a library outside of GHC and if GHC needs a few small additional features, they can be added.
The `static e` form could as well be a piece of Template Haskell, but making it a proper extension means that the compiler can enforce more invariants and be a bit more helpful to the user.
Once it works outside GHC and has proven useful, then it might be worthwhile add small specific, easily testable/maintainable features to GHC to support what goes on on your library. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Hi Eric,
On Wed, Jan 29, 2014 at 3:20 AM, Erik de Castro Lopo
Mathieu Boespflug wrote:
thank you for the good points you raise. I'll try and address each of them as best I can below.
0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
Indeed, this could be done without touching the compiler at all.
We had this response really early on in this discussion.
Quite honestly I think that should have been the end of the discussion.
The response you quote above comes in context, which includes the sentence you also quote below. In another email, the problems we face with a pure TH implementation are labeled as 1a), 1b), 2). We'd be very happy if you could show us how to solve those problems using TH alone in a way that does not impact user friendliness and static checking of invariants in any way.
The existing GHC release already have a huge workload getting releases out the door and adding to that workload without adding manpower and resources would be a bad idea.
You really should try doing this as a library outside of GHC and if GHC needs a few small additional features, they can be added.
The `static e` form could as well be a piece of Template Haskell, but making it a proper extension means that the compiler can enforce more invariants and be a bit more helpful to the user.
Once it works outside GHC and has proven useful, then it might be worthwhile add small specific, easily testable/maintainable features to GHC to support what goes on on your library.
I for one very much agree with all the principles state above. But the wider context of the discussion is that we already have such a TH userland solution today, implemented in packages distributed-static and distributed-process. We already have several users, including in the industry (to my knowledge Parallel Scientific for over a year, Tweag I/O for a couple of months, probably others...). The proposal to go ahead and implement an idea that was first presented in the original Cloud Haskell paper was borne out of frustration with the existing approach based on remote tables, which are very error prone in practice, and the operational experience that I, Facundo, Tim and others have had showing that making the semantics of distributed computation depend on *all* modules across several packages being compiled with the right incantation of compiler flags without any kind of static checking is a problem, especially for beginners. Is there something in the proposed extension that leads you to believe that is neither small nor specific, or that it would not be easily testable, or maintainable? If so, we could amend it accordingly. Best, Mathieu

indeed! Thanks erik!
On the paralllel list, edkso shares with us a single commit that adds all
the requested features as a user land lib
https://github.com/haskell-distributed/distributed-static/commit/d2bd2ebca5a...
@tweag folks, please do not write personal attacks on the issue tracker, if
you find yourself frustrated, I probably am too! please keep a positive
constructive tone in all future communications.
On Tue, Jan 28, 2014 at 9:20 PM, Erik de Castro Lopo
Mathieu Boespflug wrote:
[Sorry for the multiple reposts - couldn't quite figure out which email address doesn't get refused by the list..]
Hi Carter,
thank you for the good points you raise. I'll try and address each of them as best I can below.
0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
Indeed, this could be done without touching the compiler at all.
We had this response really early on in this discussion.
Quite honestly I think that should have been the end of the discussion.
The existing GHC release already have a huge workload getting releases out the door and adding to that workload without adding manpower and resources would be a bad idea.
You really should try doing this as a library outside of GHC and if GHC needs a few small additional features, they can be added.
The `static e` form could as well be a piece of Template Haskell, but making it a proper extension means that the compiler can enforce more invariants and be a bit more helpful to the user.
Once it works outside GHC and has proven useful, then it might be worthwhile add small specific, easily testable/maintainable features to GHC to support what goes on on your library.
Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/ _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

For interested fellows, discussion also continues in [1] and [2].
Best,
Facundo
[1] https://ghc.haskell.org/trac/ghc/ticket/7015
[2] https://groups.google.com/d/topic/parallel-haskell/b-x7VmjlEOw/discussion
On Thu, Jan 30, 2014 at 4:47 PM, Carter Schonwald
indeed! Thanks erik!
On the paralllel list, edkso shares with us a single commit that adds all the requested features as a user land lib
https://github.com/haskell-distributed/distributed-static/commit/d2bd2ebca5a...
@tweag folks, please do not write personal attacks on the issue tracker, if you find yourself frustrated, I probably am too! please keep a positive constructive tone in all future communications.
On Tue, Jan 28, 2014 at 9:20 PM, Erik de Castro Lopo
wrote: Mathieu Boespflug wrote:
[Sorry for the multiple reposts - couldn't quite figure out which email address doesn't get refused by the list..]
Hi Carter,
thank you for the good points you raise. I'll try and address each of them as best I can below.
0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
Indeed, this could be done without touching the compiler at all.
We had this response really early on in this discussion.
Quite honestly I think that should have been the end of the discussion.
The existing GHC release already have a huge workload getting releases out the door and adding to that workload without adding manpower and resources would be a bad idea.
You really should try doing this as a library outside of GHC and if GHC needs a few small additional features, they can be added.
The `static e` form could as well be a piece of Template Haskell, but making it a proper extension means that the compiler can enforce more invariants and be a bit more helpful to the user.
Once it works outside GHC and has proven useful, then it might be worthwhile add small specific, easily testable/maintainable features to GHC to support what goes on on your library.
Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/ _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

I don't have time to weigh in on this proposal right now, but I have several comments... On 24 Jan 2014, at 17:19, Facundo Domínguez wrote:
Rationale =======
We want the language extension to meet the following requirements:
1. It must be a practical alternative to the remoteTable functions in the distributed-static package.
Agreed - this is vital!
2. It must not change the build scheme used for Haskell programs. A collection of .o files produced from Haskell source code should still be possible to link with the system linking tools.
Also vital.
3. It must not restrict all communicating processes using the extension to be launched from the same binary.
I personally think this is very valuable.
About the need for using different binaries ==============================
While using distributed-process we found some use cases for supporting communicating closures between multiple binaries.
One of these use cases involved a distributed application and a monitoring tool. The monitoring tool would need to link in some graphics libraries to display information on the screen, none of which were required by the monitored application. Conversely, the monitored application would link in some modules that the monitoring application didn’t need. Crucially, both applications are fairly loosely coupled, even if they both need to exchange static values about bindings in some modules they shared.
Indeed - this is an almost canonical use-case, as are administrative (e.g., remote management) tools.
As the application depends on shared libraries, now a tool to collect these libraries would be required so they can be distributed together with the executable binary when deploying a Cloud Haskell application in a cluster. We won’t delve further into this problem.
Great idea.
Another possible line of work is extending this approach so a process can pull shared objects from a remote peer, when this remote peer sends a static value that is defined in a shared object not available to the process.
This would go a long way towards answering our questions about 'hot code upgrade' and be useful in many other areas too.

On Fri, Jan 24, 2014 at 12:19 PM, Facundo Domínguez < facundo.dominguez@tweag.io> wrote:
In principle, only symbols in shared libraries can be found. However, the dynamic linker is able to find symbols in modules that are linked statically if GHC is fed with the option -optl-Wl,--export-dynamic. A
This strikes me as highly platform specific to the Linux and possibly FreeBSD implementations of ELF; it likely will not work with Solaris ELF, which handles dynamic symbols differently (or at least used to), and will not work with non-ELF platforms (OS X, Windows) and probably won't work with a non-GNU ld such as is used on Solaris and OS X. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Hello,
I'd just like to say I haven't gone over every discussion in this
thread and had time to digest it all - I thought I would just
highlight a minor technicality.
On Fri, Jan 24, 2014 at 11:19 AM, Facundo Domínguez
Looking up static references ====================
`unstatic` is implemented as a function which finds a top-level value from the `GlobalName`, otherwise it raises an exception. It crucially relies on leveraging the system’s dynamic linker, so out-of-the-box only works with dynamically linked binaries (but see below). `unstatic` proceeds as follows:
* Determines the name of the shared library from the package name and the package version.
* Determines the symbol of the value by Z-Encoding the package name, the module name and the value name.
* Uses the system’s dynamic linker interface to obtain the address of the symbol.
* Converts the symbol to a haskell value with `GHC.Prim.addrToAny#`
In principle, only symbols in shared libraries can be found. However, the dynamic linker is able to find symbols in modules that are linked statically if GHC is fed with the option -optl-Wl,--export-dynamic. A future enhancement could be to have GHC warn the user when modules using the extension are linked statically and this option is not used during linking.
GHC only defines symbols for exported definitions in modules. So unstatic won’t be able to find the private bindings of a module. For this sake, the implementation of static should in addition ensure that the bindings it gets will appear in the symbol table when they are not exported by their defining modules.
Regarding -optl-Wl,--export-dynamic for static builds, and all that jazz - if I am understanding you right, note that Windows is a bit particular here, because there is a hard limit on the number of symbols allowed in a DLL. That means forcefully exporting *everything* could quickly get you to the symbol limit with the dynamic linker with large-ish applications (one exported function or data type may result in a handful of exported symbols created.) If you want to see the pain this has caused GHC itself, please see GHC bug #5987[1], which makes dynamic support on windows difficult - it's currently disabled now anyway. Furthermore, dynamic DLLs on Windows are a bit tricky anyway as the loader is fundamentally different from your typical ld.so (which there are ways around[2], but a bit nasty as you have to hack the COFF file.) Windows unfortunately isn't in an easy position here, but it's improving and it would be unfortunate to neglect it. This restriction does not exist with the static linker inside the RTS, so my suggestion, I guess, is that I'm inclined to want this to work for *both* static/dynamic configurations out of the box without hackery, if at all possible, which would be great for Windows users especially until the dynamic story is back up to scratch. [1] https://ghc.haskell.org/trac/ghc/ticket/5987 [2] http://blog.omega-prime.co.uk/?p=138
As the application depends on shared libraries, now a tool to collect these libraries would be required so they can be distributed together with the executable binary when deploying a Cloud Haskell application in a cluster. We won’t delve further into this problem.
And for any people interested in this - on Linux, a tool like patchelf[3] would help immensely for moving executables+their dependencies around in a 'bundle' style way. [3] http://nixos.org/patchelf.html -- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Hi Austin,
this is very useful information, thanks. So it seems that the rts
linker is here to stay for a while longer still, at least because
there is no good alternative on Windows as of yet.
If I understand you correctly, on Windows dynamic linking is not an
option in part because of the number-of-exported-symbols limit, and
when linking statically one hits the same limit if stuff like
-optl-Wl,--export-dynamic is used. So at least on Windows, the only
way out is the rts linker. Supporting both linkers is certainly an
option. If I remember correctly, the issue Facundo found with the rts
linker is that to use it for looking up symbol addresses, you
apparently need to have an object file loaded twice, effectively (once
statically linked at build time, the second time through the rts
linker at runtime for doing the lookups). Maybe there's a way around
that, or that could be added? In any case for platforms with no
alternative, like Windows, I guess double loading is a tolerable price
to pay.
Best,
Mathieu
On Wed, Jan 29, 2014 at 12:11 PM, Austin Seipp
Hello,
I'd just like to say I haven't gone over every discussion in this thread and had time to digest it all - I thought I would just highlight a minor technicality.
On Fri, Jan 24, 2014 at 11:19 AM, Facundo Domínguez
wrote: Looking up static references ====================
`unstatic` is implemented as a function which finds a top-level value from the `GlobalName`, otherwise it raises an exception. It crucially relies on leveraging the system's dynamic linker, so out-of-the-box only works with dynamically linked binaries (but see below). `unstatic` proceeds as follows:
* Determines the name of the shared library from the package name and the package version.
* Determines the symbol of the value by Z-Encoding the package name, the module name and the value name.
* Uses the system's dynamic linker interface to obtain the address of the symbol.
* Converts the symbol to a haskell value with `GHC.Prim.addrToAny#`
In principle, only symbols in shared libraries can be found. However, the dynamic linker is able to find symbols in modules that are linked statically if GHC is fed with the option -optl-Wl,--export-dynamic. A future enhancement could be to have GHC warn the user when modules using the extension are linked statically and this option is not used during linking.
GHC only defines symbols for exported definitions in modules. So unstatic won't be able to find the private bindings of a module. For this sake, the implementation of static should in addition ensure that the bindings it gets will appear in the symbol table when they are not exported by their defining modules.
Regarding -optl-Wl,--export-dynamic for static builds, and all that jazz - if I am understanding you right, note that Windows is a bit particular here, because there is a hard limit on the number of symbols allowed in a DLL. That means forcefully exporting *everything* could quickly get you to the symbol limit with the dynamic linker with large-ish applications (one exported function or data type may result in a handful of exported symbols created.) If you want to see the pain this has caused GHC itself, please see GHC bug #5987[1], which makes dynamic support on windows difficult - it's currently disabled now anyway.
Furthermore, dynamic DLLs on Windows are a bit tricky anyway as the loader is fundamentally different from your typical ld.so (which there are ways around[2], but a bit nasty as you have to hack the COFF file.) Windows unfortunately isn't in an easy position here, but it's improving and it would be unfortunate to neglect it.
This restriction does not exist with the static linker inside the RTS, so my suggestion, I guess, is that I'm inclined to want this to work for *both* static/dynamic configurations out of the box without hackery, if at all possible, which would be great for Windows users especially until the dynamic story is back up to scratch.
[1] https://ghc.haskell.org/trac/ghc/ticket/5987 [2] http://blog.omega-prime.co.uk/?p=138
As the application depends on shared libraries, now a tool to collect these libraries would be required so they can be distributed together with the executable binary when deploying a Cloud Haskell application in a cluster. We won't delve further into this problem.
And for any people interested in this - on Linux, a tool like patchelf[3] would help immensely for moving executables+their dependencies around in a 'bundle' style way.
[3] http://nixos.org/patchelf.html
-- Regards,
Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/
-- Mathieu Boespflug Founder at http://tweag.io.
participants (9)
-
Austin Seipp
-
Brandon Allbery
-
Carter Schonwald
-
Erik de Castro Lopo
-
Facundo Domínguez
-
Jost Berthold
-
Mathieu Boespflug
-
Mathieu Boespflug
-
Tim Watson