Static values language extension proposal

24 Jan 2014

      Hello,
    With the support of Tweag I/O, Mathieu and I have been assembling
a design proposal for the language extension for Static values that
will take Cloud Haskell a big step forward in usability. Please, find
the proposal inlined below.

    We are looking forward to discuss its feasibility and features
with the community.

Best,
Facundo

--

In these notes we discuss a design of the language extension proposed
in [1] for Cloud Haskell. That is, support from the compiler to
produce labels that can be used to identify Haskell top-level bindings
across processes in a network.

Static values
=========

Following [1], the extension consists of a new syntactic form `static
e`, along with a type constructor `StaticRef` and a function

unstatic :: StaticRef a -> a

The idea is that values of type `StaticRef a` uniquely identify a
value that can be referred to by a global name rather than serialized
over the network between processes that are instances of a single
binary, because all such processes share the same top-level bindings.

Generating static references
====================

We start by introducing global names. A `GlobalName` is a symbol bound
in the top-level environment. It is much like global names in Template
Haskell, but `GlobalNames` always refer to terms, and they include a
package version.

data GlobalName = GlobalName PkgName PkgVersion ModName OccName

`GlobalNames` can be used as references to static values.

newtype StaticRef a = StaticRef GlobalName

`StaticRef a` is to `GlobalName` what `Ptr a` is to `Addr#`: a wrapper
with a phantom type parameter that keeps track of the type of the
value that is referenced.

The special form

static e

is an expression of type `StaticRef a` where `e :: a` is a closed
expression (meaning any free variables in `e` are bound in the
top-level environment).

If `e` is an identifier, `static e` just refers to it. Otherwise, the
compiler needs to introduce a new top-level binding with a fresh name
and the expression used as right-hand side, and the static reference
would point to this top-level binding instead.

Looking up static references
====================

`unstatic` is implemented as a function which finds a top-level value
from the `GlobalName`, otherwise it raises an exception. It crucially
relies on leveraging the system’s dynamic linker, so out-of-the-box
only works with dynamically linked binaries (but see below).
`unstatic` proceeds as follows:

  * Determines the name of the shared library from the package name
and the package version.

  * Determines the symbol of the value by Z-Encoding the package name,
the module name and the value name.

  * Uses the system’s dynamic linker interface to obtain the address
of the symbol.

  * Converts the symbol to a haskell value with `GHC.Prim.addrToAny#`

In principle, only symbols in shared libraries can be found. However,
the dynamic linker is able to find symbols in modules that are linked
statically if GHC is fed with the option -optl-Wl,--export-dynamic. A
future enhancement could be to have GHC warn the user when modules
using the extension are linked statically and this option is not used
during linking.

GHC only defines symbols for exported definitions in modules. So
unstatic won’t be able to find the private bindings of a module. For
this sake, the implementation of static should in addition ensure that
the bindings it gets will appear in the symbol table when they are not
exported by their defining modules.

Template Haskell support
==================

The static keyword needs to be made available in Template Haskell so
the distributed-static package can benefit from this language
extension.

Rationale
=======

We want the language extension to meet the following requirements:

  1. It must be a practical alternative to the remoteTable functions
in the distributed-static package.

  2. It must not change the build scheme used for Haskell programs. A
collection of .o files produced from Haskell source code should still
be possible to link with the system linking tools.

  3. It must not restrict all communicating processes using the
extension to be launched from the same binary.

  4. It must not significantly increase the binary size.

(1) is addressed by replacing remote tables with the symbol tables
produced by the compiler. Additionally, Template Haskell support is
included so that the existing distributed-static package can be
adapted and extended to include this extension.

(2) is addressed by choosing a scheme which does not require the
linker to perform any extension-specific procedure to collect the
static values in various modules. There’s a trade off here though,
since symbols in statically linked modules cannot be accessed unless
-optl-Wl,--export-dynamic is supplied during linking.

(3) is addressed by allowing programs to exchange static values for
any bindings found in the modules they share.

(4) is addressed by reusing the symbol tables produced by the compiler
in object files rather than creating separate remote tables.

About the need for using different binaries
==============================

While using distributed-process we found some use cases for supporting
communicating closures between multiple binaries.

One of these use cases involved a distributed application and a
monitoring tool. The monitoring tool would need to link in some
graphics libraries to display information on the screen, none of which
were required by the monitored application. Conversely, the monitored
application would link in some modules that the monitoring application
didn’t need. Crucially, both applications are fairly loosely coupled,
even if they both need to exchange static values about bindings in
some modules they shared.

An analogous use case involved the distributed application and a
control application that would be used to change dynamic settings of
the former.

Further Work
==========

As the application depends on shared libraries, now a tool to collect
these libraries would be required so they can be distributed together
with the executable binary when deploying a Cloud Haskell application
in a cluster. We won’t delve further into this problem.

Another possible line of work is extending this approach so a process
can pull shared objects from a remote peer, when this remote peer
sends a static value that is defined in a shared object not available
to the process.

Integration with distributed-static
=======================

The package distributed-static could either adopt this extension as
the only implementation of static values, or it could support many
notions of static references, say by using a type class to overload
`unstatic`.

class Static st s | s -> st where
   unstatic :: st -> s a -> Either String a

where the class parameter `st` is provided for backwards compatibility
with the existing scheme to provide context-dependent information. The
extension we present here does not depend on this parameter, so `()`
could be used for the `StaticRef` instance.

instance Static () StaticRef where ...

References
========

[1] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards
Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011.
ISSN 0362-1340.

Facundo Domínguez

Carter Schonwald

Tim Watson

Mathieu Boespflug

Carter Schonwald

Tim Watson

Brandon Allbery

Tim Watson

Facundo Domínguez

Mathieu Boespflug

Brandon Allbery

Carter Schonwald

Facundo Domínguez

Facundo Domínguez

Jost Berthold

Mathieu Boespflug

Tim Watson

Erik de Castro Lopo

Mathieu Boespflug

Carter Schonwald

Facundo Domínguez

Tim Watson

Brandon Allbery

Austin Seipp

Mathieu Boespflug

tags

participants (9)