
Hello, With the support of Tweag I/O, Mathieu and I have been assembling a design proposal for the language extension for Static values that will take Cloud Haskell a big step forward in usability. Please, find the proposal inlined below. We are looking forward to discuss its feasibility and features with the community. Best, Facundo -- In these notes we discuss a design of the language extension proposed in [1] for Cloud Haskell. That is, support from the compiler to produce labels that can be used to identify Haskell top-level bindings across processes in a network. Static values ========= Following [1], the extension consists of a new syntactic form `static e`, along with a type constructor `StaticRef` and a function unstatic :: StaticRef a -> a The idea is that values of type `StaticRef a` uniquely identify a value that can be referred to by a global name rather than serialized over the network between processes that are instances of a single binary, because all such processes share the same top-level bindings. Generating static references ==================== We start by introducing global names. A `GlobalName` is a symbol bound in the top-level environment. It is much like global names in Template Haskell, but `GlobalNames` always refer to terms, and they include a package version. data GlobalName = GlobalName PkgName PkgVersion ModName OccName `GlobalNames` can be used as references to static values. newtype StaticRef a = StaticRef GlobalName `StaticRef a` is to `GlobalName` what `Ptr a` is to `Addr#`: a wrapper with a phantom type parameter that keeps track of the type of the value that is referenced. The special form static e is an expression of type `StaticRef a` where `e :: a` is a closed expression (meaning any free variables in `e` are bound in the top-level environment). If `e` is an identifier, `static e` just refers to it. Otherwise, the compiler needs to introduce a new top-level binding with a fresh name and the expression used as right-hand side, and the static reference would point to this top-level binding instead. Looking up static references ==================== `unstatic` is implemented as a function which finds a top-level value from the `GlobalName`, otherwise it raises an exception. It crucially relies on leveraging the system’s dynamic linker, so out-of-the-box only works with dynamically linked binaries (but see below). `unstatic` proceeds as follows: * Determines the name of the shared library from the package name and the package version. * Determines the symbol of the value by Z-Encoding the package name, the module name and the value name. * Uses the system’s dynamic linker interface to obtain the address of the symbol. * Converts the symbol to a haskell value with `GHC.Prim.addrToAny#` In principle, only symbols in shared libraries can be found. However, the dynamic linker is able to find symbols in modules that are linked statically if GHC is fed with the option -optl-Wl,--export-dynamic. A future enhancement could be to have GHC warn the user when modules using the extension are linked statically and this option is not used during linking. GHC only defines symbols for exported definitions in modules. So unstatic won’t be able to find the private bindings of a module. For this sake, the implementation of static should in addition ensure that the bindings it gets will appear in the symbol table when they are not exported by their defining modules. Template Haskell support ================== The static keyword needs to be made available in Template Haskell so the distributed-static package can benefit from this language extension. Rationale ======= We want the language extension to meet the following requirements: 1. It must be a practical alternative to the remoteTable functions in the distributed-static package. 2. It must not change the build scheme used for Haskell programs. A collection of .o files produced from Haskell source code should still be possible to link with the system linking tools. 3. It must not restrict all communicating processes using the extension to be launched from the same binary. 4. It must not significantly increase the binary size. (1) is addressed by replacing remote tables with the symbol tables produced by the compiler. Additionally, Template Haskell support is included so that the existing distributed-static package can be adapted and extended to include this extension. (2) is addressed by choosing a scheme which does not require the linker to perform any extension-specific procedure to collect the static values in various modules. There’s a trade off here though, since symbols in statically linked modules cannot be accessed unless -optl-Wl,--export-dynamic is supplied during linking. (3) is addressed by allowing programs to exchange static values for any bindings found in the modules they share. (4) is addressed by reusing the symbol tables produced by the compiler in object files rather than creating separate remote tables. About the need for using different binaries ============================== While using distributed-process we found some use cases for supporting communicating closures between multiple binaries. One of these use cases involved a distributed application and a monitoring tool. The monitoring tool would need to link in some graphics libraries to display information on the screen, none of which were required by the monitored application. Conversely, the monitored application would link in some modules that the monitoring application didn’t need. Crucially, both applications are fairly loosely coupled, even if they both need to exchange static values about bindings in some modules they shared. An analogous use case involved the distributed application and a control application that would be used to change dynamic settings of the former. Further Work ========== As the application depends on shared libraries, now a tool to collect these libraries would be required so they can be distributed together with the executable binary when deploying a Cloud Haskell application in a cluster. We won’t delve further into this problem. Another possible line of work is extending this approach so a process can pull shared objects from a remote peer, when this remote peer sends a static value that is defined in a shared object not available to the process. Integration with distributed-static ======================= The package distributed-static could either adopt this extension as the only implementation of static values, or it could support many notions of static references, say by using a type class to overload `unstatic`. class Static st s | s -> st where unstatic :: st -> s a -> Either String a where the class parameter `st` is provided for backwards compatibility with the existing scheme to provide context-dependent information. The extension we present here does not depend on this parameter, so `()` could be used for the `StaticRef` instance. instance Static () StaticRef where ... References ======== [1] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011. ISSN 0362-1340.