RFC: A standardized interface between web servers and applications or frameworks (ala WSGI)

Good day hackers, The Python community have been successful in standardizing an interface between web server and applications or frameworks resulting in users having more control over their web stack by being able to pick frameworks independently from web servers, and vice versa. I propose we try to do the same for Haskell. I've written half a draft for a Haskell version of Python's PEP 333 [1]. If you're interested in taking part in this effort please read through the Python spec first (as it is way more complete and you can understand this proposal better by reading it, I've skipped some important issues in my first draft) and then go read the Haskell spec [2]. I'm particularly interesting in feedback regarding: * Doing in this way won't work as it violates HTTP/CGI spec part X, Y and Z (the Python spec takes lots of things from the CGI spec including naming and semantics). * My server/framework could never provide/be run under this interface. * This interface has bad performance by design. * Using a different set of data types would work better. The spec needs to be extended to cover all the corners of HTTP. Some parts need to be motivated better. It is easier for me to motivate things if people would tell me what parts are badly motivated. Note: I'm open to a complete rewrite if needed. I'm not wedded to the current design and/or wording. In fact parts of the wording is borrowed from the Python spec. The parts with bad grammar are all mine. 1. http://www.python.org/dev/peps/pep-0333/ 2. http://www.haskell.org/haskellwiki/WebApplicationInterface -- Johan

Johan Tibell ha scritto:
Good day hackers,
The Python community have been successful in standardizing an interface between web server and applications or frameworks resulting in users having more control over their web stack by being able to pick frameworks independently from web servers, and vice versa. I propose we try to do the same for Haskell. I've written half a draft for a Haskell version of Python's PEP 333 [1]. If you're interested in taking part in this effort please read through the Python spec first (as it is way more complete and you can understand this proposal better by reading it, I've skipped some important issues in my first draft) and then go read the Haskell spec [2].
I'm very interested, thanks for the effort.
I'm particularly interesting in feedback regarding:
* Doing in this way won't work as it violates HTTP/CGI spec part X, Y and Z (the Python spec takes lots of things from the CGI spec including naming and semantics). * My server/framework could never provide/be run under this interface. * This interface has bad performance by design. * Using a different set of data types would work better.
I'm not yet an Haskell expert, however one of the great feature of WSGI is that the environ is a Python dictionary. This means that the user can add new keys/values in it. I'm using this feature, in my WSGI implementation for Nginx (and in a mini framework I'm writing) to store user configuration in the environment, and to cache the result of the parsing of request headers. I'm not sure if this make sense in Haskell.
The spec needs to be extended to cover all the corners of HTTP. Some parts need to be motivated better. It is easier for me to motivate things if people would tell me what parts are badly motivated.
Note: I'm open to a complete rewrite if needed. I'm not wedded to the current design and/or wording. In fact parts of the wording is borrowed from the Python spec. The parts with bad grammar are all mine.
1. http://www.python.org/dev/peps/pep-0333/ 2. http://www.haskell.org/haskellwiki/WebApplicationInterface
-- Johan
Manlio Perillo

On Apr 13, 2008, at 10:21 , Manlio Perillo wrote:
I'm not yet an Haskell expert, however one of the great feature of WSGI is that the environ is a Python dictionary. This means that the user can add new keys/values in it.
I'm using this feature, in my WSGI implementation for Nginx (and in a mini framework I'm writing) to store user configuration in the environment, and to cache the result of the parsing of request headers.
I'm not sure if this make sense in Haskell.
Er, that's just a StateT and possibly a ReaderT. I suspect a specific monad stack wrapping IO will end up being part of the interface anyway. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Sun, Apr 13, 2008 at 4:59 AM, Johan Tibell
* Using a different set of data types would work better.
Give that this is Haskell, I'd suggest more types ;) HTTP headers aren't just strings and, at the risk of tooting my own horn, I'll point to the Headers structure in [1]. Likewise, URLs have lots of structure that should just be handled in one place [2] [1] http://darcs.imperialviolet.org/darcsweb.cgi?r=network-minihttp;a=headblob;f... [2] http://darcs.imperialviolet.org/darcsweb.cgi?r=network-minihttp;a=headblob;f... AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org

On Mon, 14 Apr 2008 11:06:43 Adam Langley wrote:
On Sun, Apr 13, 2008 at 4:59 AM, Johan Tibell
wrote: * Using a different set of data types would work better.
Give that this is Haskell, I'd suggest more types ;)
HTTP headers aren't just strings and, at the risk of tooting my own horn, I'll point to the Headers structure in [1].
And it could go further. The use of a given header is often valid only in certain requests or responses. Perhaps sprinkling some phantom types or type classes around could represent that. Daniel

Adam Langley wrote:
On Sun, Apr 13, 2008 at 4:59 AM, Johan Tibell
wrote: * Using a different set of data types would work better.
Give that this is Haskell, I'd suggest more types ;)
HTTP headers aren't just strings and, at the risk of tooting my own horn, I'll point to the Headers structure in [1].
That is one of the things I don't like about Network.HTTP, which also enumerates header fields. It is inconvenient to have to look up the names in the data type, when the standard field names are already known, and it makes using non-RFC2616 headers less convenient. Automatic parsing of header fields also makes unusual usage inconvenient, (for example the Range header support in [1] is a profile of RFC2616.) I think those kinds of features belong in frameworks; they will be more of an annoyance than a help to anyone that is writing to the WSGI layer.
Likewise, URLs have lots of structure that should just be handled in one place [2]
Yes, I think should be parsed to the level of granularity specified by RFC 2616 (i.e. scheme, host, port, path, query string) and anything more (like parsing query strings) should be handled by frameworks.
[1] http://darcs.imperialviolet.org/darcsweb.cgi?r=network-minihttp;a=headblob;f... [2] http://darcs.imperialviolet.org/darcsweb.cgi?r=network-minihttp;a=headblob;f...
AGL
-- Michaeljohn Clement

In a sense, the CGIT interface provided by Network.CGI already is a sort of halfway implementation of what we're discussing, no? I'd be interested in approaching this from the other way -- specifying exactly what CGIT doesn't provide and therefore what folks want to see. As far as I can tell, the main issue with CGIT is that it doesn't handle streaming/resource issues very well. The main innovation I see provided here is the enumerator interface, which is a very nice and flexible approach to I/O and provides a way to handle comet cleanly to boot. Since the application type as proposed is Env -> IO (Code, Headers, ResponseEnumerator), what we're really getting is almost an equiv. (modulo enumerators) of unwrapping CGIT IO CGIResponse with a run function. So what we lose is the ability for all our nicely named record accessors and functions to be shared across frameworks -- i.e. the flexibility a monad transformer *does* provide. So my question is if we can somehow preserve that with an appropriate typeclass. I'd ideally like to see this engineered in two parts -- a "cgit-like" typeclass interface that allows access to the environment but is agnostic as to response type, so that comet-style and other apps that take special advantage of enumerators can be built on top of it as well as apps that simply perform lazy writes; and the lower-level enumerator interface. This ideally would let the higher-level interface be built over any stack at all (i.e. STM-based as well, or even a pure stack), while the lower level interface that calls it is some glue of the given constant type in the IO monad. This would be of great help to hvac. There's also the fact that this could be designed ground-up with greater bytestring use, but that doesn't seem immense to me. Outside of this, I'm not quite sure what else CGIT lacks. I'm with Chris Smith's arguments as to the headers question, and it seems to me that dicts are best done using MVar-style primitives. I'm a bit at sea as to why the queryString is here just represented as a bytestring -- is it seriously an issue that some apps may want to use it other than in the standard parsed way? Is the idea here that lib functions would fill in and be shared among frameworks? On the other hand, seperating GET and POST vars is a good idea, and its a shame that CGIT doesn't allow this. The openness here seems in part based on the desire to keep different forms of file upload handling available. However, the work that oleg did with regards to CGI also seems promising -- i.e., rather than using an enumerator, simply taking advantage of laziness to unpack the input stream into a lazy dictionary. Regards, S. On Apr 13, 2008, at 7:59 AM, Johan Tibell wrote:
Good day hackers,
The Python community have been successful in standardizing an interface between web server and applications or frameworks resulting in users having more control over their web stack by being able to pick frameworks independently from web servers, and vice versa. I propose we try to do the same for Haskell. I've written half a draft for a Haskell version of Python's PEP 333 [1]. If you're interested in taking part in this effort please read through the Python spec first (as it is way more complete and you can understand this proposal better by reading it, I've skipped some important issues in my first draft) and then go read the Haskell spec [2]. I'm particularly interesting in feedback regarding:
* Doing in this way won't work as it violates HTTP/CGI spec part X, Y and Z (the Python spec takes lots of things from the CGI spec including naming and semantics). * My server/framework could never provide/be run under this interface. * This interface has bad performance by design. * Using a different set of data types would work better.
The spec needs to be extended to cover all the corners of HTTP. Some parts need to be motivated better. It is easier for me to motivate things if people would tell me what parts are badly motivated.
Note: I'm open to a complete rewrite if needed. I'm not wedded to the current design and/or wording. In fact parts of the wording is borrowed from the Python spec. The parts with bad grammar are all mine.
1. http://www.python.org/dev/peps/pep-0333/ 2. http://www.haskell.org/haskellwiki/WebApplicationInterface
-- Johan _______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Haskell works fine with both the FastCGI and SCGI protocols (There are libraries floating around for both), I have found them much nicer than any mod_* web server plugin in general. John -- John Meacham - ⑆repetae.net⑆john⑈

I am very interested in this work. One thing missing is support for all HTTP methods, not just those in RFC 2616. As-is, something like WebDAV cannot be implemented. That probably means requestMethod should be a (Byte)String. What about something like sendfile(2) available on some platforms? To allow the server to make use of such optimizations, how about an optional alternative to the enumerator? Perhaps the app can optionally pass a path or an open fd back to the server in place of an enumerator, which allows servers to do any kind of buffering optimizations, etc, that they may know about, as well as using any platform-specific optimizations like sendfile. -- Michaeljohn Clement

First, apologies for not responding earlier. I spent my week at a conference in Austria. Second, thanks for all the feedback! I thought I go through some of my thoughts on the issues raised. Just to try to reiterate the goals of this effort: * To provide a common, no frills interface between web servers and applications or frameworks to increase choice for application developers. * To make that interface easy enough to implement so current web servers and frameworks will implement it. This is crucial for it being adopted. * Avoid design decisions that would limit the number of frameworks that can use the interface. One example of a limiting decisions would be one that limits the maximal possible performance by using e.g. inefficient data types. I'll try to start with what seems to be the easier issues. sendfile(2) support =================== I would like see this supported in the interface. I didn't include it in the first draft as I didn't have a good idea of where to put it. One idea would be to add the following field to the Environment record: sendfile :: Maybe (FD -> IO ()) Possibly with additional parameters as needed. The reason that sendfile needs to be included in the environment instead of just a binding to the C function is that the Socket used for the connection is hidden from the application side and its use is abstracted by the input and output enumerators. The other suggested solution (to return either an Enumerator or a file descriptor) might work better. I just wanted to communicate that I think it should be included. Extension HTTP methods ====================== I did have extension methods in mind when I wrote the draft but didn't include it. I see two possible options. 1. Change the HTTP method enumeration to: data Method = Get | ... | ExtensionMethod ByteString 2. Treat all methods as bytestrings: type Method = ByteString This treatment touches on the discussion on typing further down in this email. I still haven't thought enough about the consequences (if indeed there are any of any importance) of the two approaches. The Enumerator type =================== To recap, I proposed the following type for the Enumerator abstraction: type Enumerator = forall a. (a -> ByteString -> IO (Either a a)) -> a -> IO a The IO monad is a must both in the return type of the Enumerator function and in the iteratee function (i.e. the first parameter of the enumartor). IO in the return type of the enumerator is a must since the server must perform I/O (i.e. reading from the client socket) to provide the input and the application might need to perform I/O to create the response. The appearance of the IO monad in the iteratee functions is an optimization. It makes it possible for the server or application to act immediately when a chunk of data is received. This saves memory when large files are being sent as they can be written to disk/network immediately instead of being cached in memory. There are some different design (and possibly performance trade-offs) that could be made. The current enumerator type can be viewed as an unrolled State monad suggesting that it would be possible to change the type to: type Enumerator = forall t. MonadTrans t => (ByteString -> t IO (Either a a)) -> t IO a which is a more general type allowing for an arbitrary monad stack. Some arguments against doing this: * The unrolled state version is analogous to a left fold (and can indeed be seen as one) and should thus be familiar to all Haskell programmers. * A, possibly unfounded, worry I have is that it might be hard to optimize way the extra abstraction layer putting a performance tax on all applications, whether they use the extra flexibility or not. It would be great if any of the Takusen authors (or Oleg since he wrote the enumerator paper) could comment on this. Note: I haven't thought this one through. It was suggested to me on #haskell and I thought I should at least bring it up. Extra environment variables =========================== I've intended all along to include a field for remaining, optional extra pieces of information taken from e.g. the web server, the shell, etc. I haven't come up with an good name for this field by the idea is to add another field to the Environment: data Environment = Environment { ... , extraEnvironment :: [(ByteString, ByteString)] } Typing and data types ===================== Most discussions seem to, perhaps unsurprisingly, have centered around the use of data types and typing in general. Let me start by giving an assumptions I've used when writing this draft: Existing frameworks already have internal representations of the request URL, headers, etc. Changing these would be costly. Even if this was done I don't think it is possible to pick any one type that all frameworks could use to represent an HTTP requests or even parts of a request. Different frameworks need different types. Let me as an example use the Last-Modified header field. Assume we used named record fields for all headers: data Headers = Headers { ... , lastModified :: Maybe ??? } There is no type we could use for ??? that would be useful for all frameworks. There are several possible DateTime types possible with different design trade-offs. On the other hand, all frameworks likely already have a function to convert raw bytes to whatever internal representation used in that particular framework. Trying to provide more structured types that bytestrings appears to have two drawbacks: 1. It adds boiler plate type conversion code. No benefit is gained by the extra typing. Defining more types in the WAI interface adds complexity to the interface. 2. It adds an unnecessary performance penalty. My suggestions is this: We use a minimal number of types in the interface and leave it up to higher levels to add these. Summary ======= I suggest that the overall design principle should be this: Give a data type (e.g. Environment) with a minimal amount of structure corresponding to the one given in the HTTP RFC plus some extra optional environment provided by the web server and the environment (e.g. shell) in which it is run. I suggest we leave the raw bytestrings in the interface as interpreting them is best done by the framework. This also lends itself to an efficient implementation as the bytestrings in the environment could just be substrings (an O(1) operation) of the raw input read from the socket. Let me make a slight reservation here and say that I might want to split the URL into two parts (e.g. SCRIPT_NAME and PATH_INFO, like in CGI and WSGI). The reason for doing this is that it makes it much easier to nest applications by having each layer consuming one part of the URL and leave the rest to the nested application. For example, consider the task of writing an URL dispatcher that picks different applications depending on the URL prefix: storeApp, adminApp, urlMap :: Application urlMap = mkUrlMap [("/admin", adminApp) ,("/store", storeApp) ] serve :: Application -> IO () -- Provided by the web server. main = serve urlMap When a request for /store/items/1 reaches the URL mapper application it consumes the initial prefix (and puts it in scriptName) and leaves the remaining URL part in pathInfo. adminApp or storeApp can then use what's left in pathInfo to do further dispatching (to a handler function for example). Phew, this turned into a longer email than I thought. If I forgot to respond to any points raised please don't be afraid to raise them again.
participants (8)
-
Adam Langley
-
Brandon S. Allbery KF8NH
-
Daniel McAllansmith
-
Johan Tibell
-
John Meacham
-
Manlio Perillo
-
Michaeljohn Clement
-
Sterling Clover