RFC: A standardized interface between web servers and applications or frameworks (ala WSGI)

Good day hackers, The Python community have been successful in standardizing an interface between web server and applications or frameworks resulting in users having more control over their web stack by being able to pick frameworks independently from web servers, and vice versa. I propose we try to do the same for Haskell. I've written half a draft for a Haskell version of Python's PEP 333 [1]. If you're interested in taking part in this effort please read through the Python spec first (as it is way more complete and you can understand this proposal better by reading it, I've skipped some important issues in my first draft) and then go read the Haskell spec [2]. I'm particularly interesting in feedback regarding: * Doing in this way won't work as it violates HTTP/CGI spec part X, Y and Z (the Python spec takes lots of things from the CGI spec including naming and semantics). * My server/framework could never provide/be run under this interface. * This interface has bad performance by design. * Using a different set of data types would work better. The spec needs to be extended to cover all the corners of HTTP. Some parts need to be motivated better. It is easier for me to motivate things if people would tell me what parts are badly motivated. Note: I'm open to a complete rewrite if needed. I'm not wedded to the current design and/or wording. In fact parts of the wording is borrowed from the Python spec. The parts with bad grammar are all mine. 1. http://www.python.org/dev/peps/pep-0333/ 2. http://www.haskell.org/haskellwiki/WebApplicationInterface -- Johan

On Sun, Apr 13, 2008 at 4:59 AM, Johan Tibell
* Using a different set of data types would work better.
Give that this is Haskell, I'd suggest more types ;) HTTP headers aren't just strings and, at the risk of tooting my own horn, I'll point to the Headers structure in [1]. Likewise, URLs have lots of structure that should just be handled in one place [2] [1] http://darcs.imperialviolet.org/darcsweb.cgi?r=network-minihttp;a=headblob;f... [2] http://darcs.imperialviolet.org/darcsweb.cgi?r=network-minihttp;a=headblob;f... AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org

On Mon, 14 Apr 2008 11:06:43 Adam Langley wrote:
On Sun, Apr 13, 2008 at 4:59 AM, Johan Tibell
wrote: * Using a different set of data types would work better.
Give that this is Haskell, I'd suggest more types ;)
HTTP headers aren't just strings and, at the risk of tooting my own horn, I'll point to the Headers structure in [1].
And it could go further. The use of a given header is often valid only in certain requests or responses. Perhaps sprinkling some phantom types or type classes around could represent that. Daniel

On Sun, 13 Apr 2008 16:06:43 -0700, Adam Langley wrote:
On Sun, Apr 13, 2008 at 4:59 AM, Johan Tibell
wrote: * Using a different set of data types would work better.
Give that this is Haskell, I'd suggest more types ;)
HTTP headers aren't just strings and, at the risk of tooting my own horn, I'll point to the Headers structure in [1].
Wait, I'm not sure I agree here. How are headers not just strings? By assuming that, are we guaranteeing that anything using this interface cannot respond gracefully to a client that writes malformed headers? Another perspective: there is unnecessary variation there in how interfaces are represented. If I'm looking for a header, and I know its name as a string, how do I look for it? Well, apparently it's either a named field (if it's known to the interface) or in the "other" section (if not). So I write a gigantic case analysis? But then suppose the interface is updated later to include some headers that popped up unofficially but then are standardized in a future RFC. (This is not too odd; lots of REST web services invent new headers every day, many of which do things that make sense outside of the particular application.) Does old code that handled these headers stop working, just because it was looking in the "other" section, but now needs to check a field dedicated to that header?
Likewise, URLs have lots of structure that should just be handled in one place [2]
This I do agree with. -- Chris Smith

On Sun, Apr 13, 2008 at 6:32 PM, Chris Smith
Does old code that handled these headers stop working, just because it was looking in the "other" section, but now needs to check a field dedicated to that header?
Yes, but it would be very sad if we couldn't do common header parsing because of this. I'd suggest that all the headers given in RFC 2616 be parsed and nothing else. That leaves the question of how we would handle the addition of any extra ones in the future. Firstly, packages could depend on a given version of this interface and we declare that the set of handled headers doesn't change within a major version. Better would be some static assertion that the interface doesn't handle some set of headers. Maybe there's a type trick to do this, but I can't think of one, so we might have to settle for a non static: checkUnparsedHeaders :: [String] -> IO () Which can be put in 'main' (or equivalent) and can call error if there's a mismatch. AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org

On Mon, Apr 14, 2008 at 3:27 AM, Adam Langley
On Sun, Apr 13, 2008 at 6:32 PM, Chris Smith
wrote: Does old code that handled these headers stop working, just because it was looking in the "other" section, but now needs to check a field dedicated to that header?
Yes, but it would be very sad if we couldn't do common header parsing because of this.
I'd suggest that all the headers given in RFC 2616 be parsed and nothing else.
Both request and response accept any entity headers and 7.1 (of RFC 2616) says that a valid entity header is an extension header, which can be any kind of header.
That leaves the question of how we would handle the addition of any extra ones in the future. Firstly, packages could depend on a given version of this interface and we declare that the set of handled headers doesn't change within a major version.
Better would be some static assertion that the interface doesn't handle some set of headers. Maybe there's a type trick to do this, but I can't think of one, so we might have to settle for a non static:
checkUnparsedHeaders :: [String] -> IO ()
Which can be put in 'main' (or equivalent) and can call error if there's a mismatch.
Most of the times a Header makes sense in some scenarios and doesn't in others, so a package level checking is too coarse grained. IMHO it would be better to create a two layered approach. The bottom layer handles the request as a bunch of strings, just checks for structural correctness (i.e. break the headers by line and such) without checking if the headers are correct. The top layer provides a bunch of parser combinators to validate, parse and sanitize the request so a library can create its own contract: newtype Contract e a = Contract (HttpRequest -> e a) contract :: Contract Maybe MyRequest contract = do pragma <- parseHeader "Pragma" (\header -> ...) ... return $ MyRequest pragma ... main = do request <- readHttpRequest sanitized <- enforce contract request ... Such approach would be more flexible and extensible. Later other packages could provide specialized combinators for other RFCs. HTTP is regularly extended, in RFCs and by private parties experimenting before writing an RFC, it would be bad if the primary Haskell library for HTTP didn't support this behavior. Also it's important to notice that the HTTP spec defines things to be mostly orthogonal, so most of the headers stand on their own and can be used in combination with many methods and other headers, every once in a while someone finds a combination that makes sense and wasn't thought of before.
AGL
-- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org
Best regards, Daniel Yokomizo.

On Mon, Apr 14, 2008 at 4:54 AM, Daniel Yokomizo
Both request and response accept any entity headers and 7.1 (of RFC 2616) says that a valid entity header is an extension header, which can be any kind of header.
Is wasn't suggesting that other headers be dropped, just that they remain as strings.
IMHO it would be better to create a two layered approach. The bottom layer handles the request as a bunch of strings, just checks for structural correctness (i.e. break the headers by line and such) without checking if the headers are correct. The top layer provides a bunch of parser combinators to validate, parse and sanitize the request so a library can create its own contract:
Ok, I think I'm convinced by this argument. I'd hope that a standard set of header parsers be defined, and that an application which only cares about 2616 headers can do call a single function to parse them all, but I no longer advocate that the base interface use parsed forms of headers. Also, parsing URLs seems to be pretty uncontroversial (maybe parsing key, value pairs from the path, maybe not) AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org

On Mon, 14 Apr 2008 13:32:07 Chris Smith wrote:
On Sun, 13 Apr 2008 16:06:43 -0700, Adam Langley wrote:
On Sun, Apr 13, 2008 at 4:59 AM, Johan Tibell
wrote:
* Using a different set of data types would work better.
Give that this is Haskell, I'd suggest more types ;)
HTTP headers aren't just strings and, at the risk of tooting my own horn, I'll point to the Headers structure in [1].
Wait, I'm not sure I agree here. How are headers not just strings?
Headers, at least their values, aren't strings. The specification says so. I think headers should be represented by something more specific than a string.
By assuming that, are we guaranteeing that anything using this interface cannot respond gracefully to a client that writes malformed headers?
Having explicit types for headers doesn't preclude trying to handle messages with malformed headers. Soldiering on in the face of malformed messages as a general strategy is pretty dubious in my opinion. In the specific cases where you've determined it is necessary you want to be able to register a work-around parser for that section of the message, and be able to tell that it has been used. A decent framework can supply a catalogue of commonly required work-arounds.
Another perspective: there is unnecessary variation there in how interfaces are represented. If I'm looking for a header, and I know its name as a string, how do I look for it? Well, apparently it's either a named field (if it's known to the interface) or in the "other" section (if not). So I write a gigantic case analysis? But then suppose the interface is updated later to include some headers that popped up unofficially but then are standardized in a future RFC. (This is not too odd; lots of REST web services invent new headers every day, many of which do things that make sense outside of the particular application.) Does old code that handled these headers stop working, just because it was looking in the "other" section, but now needs to check a field dedicated to that header?
I don't like the idea of having a fixed enumeration of methods or headers. You need to be able to define new methods and headers at will, and ideally have the usage of headers constrained to valid contexts. This suggests to me type classes that establish a 'can occur in' relationship between request/response, method and a given general/request/response/entity header. By importing new method or header data type, appropriate type class instances and registering an appropriate message parser extension you can mix and match which headers and methods you support. GET and HEAD are the only ones that MUST be supported after all. Daniel

Adam Langley wrote:
On Sun, Apr 13, 2008 at 4:59 AM, Johan Tibell
wrote: * Using a different set of data types would work better.
Give that this is Haskell, I'd suggest more types ;)
HTTP headers aren't just strings and, at the risk of tooting my own horn, I'll point to the Headers structure in [1].
That is one of the things I don't like about Network.HTTP, which also enumerates header fields. It is inconvenient to have to look up the names in the data type, when the standard field names are already known, and it makes using non-RFC2616 headers less convenient. Automatic parsing of header fields also makes unusual usage inconvenient, (for example the Range header support in [1] is a profile of RFC2616.) I think those kinds of features belong in frameworks; they will be more of an annoyance than a help to anyone that is writing to the WSGI layer.
Likewise, URLs have lots of structure that should just be handled in one place [2]
Yes, I think should be parsed to the level of granularity specified by RFC 2616 (i.e. scheme, host, port, path, query string) and anything more (like parsing query strings) should be handled by frameworks.
[1] http://darcs.imperialviolet.org/darcsweb.cgi?r=network-minihttp;a=headblob;f... [2] http://darcs.imperialviolet.org/darcsweb.cgi?r=network-minihttp;a=headblob;f...
AGL
-- Michaeljohn Clement

In a sense, the CGIT interface provided by Network.CGI already is a sort of halfway implementation of what we're discussing, no? I'd be interested in approaching this from the other way -- specifying exactly what CGIT doesn't provide and therefore what folks want to see. As far as I can tell, the main issue with CGIT is that it doesn't handle streaming/resource issues very well. The main innovation I see provided here is the enumerator interface, which is a very nice and flexible approach to I/O and provides a way to handle comet cleanly to boot. Since the application type as proposed is Env -> IO (Code, Headers, ResponseEnumerator), what we're really getting is almost an equiv. (modulo enumerators) of unwrapping CGIT IO CGIResponse with a run function. So what we lose is the ability for all our nicely named record accessors and functions to be shared across frameworks -- i.e. the flexibility a monad transformer *does* provide. So my question is if we can somehow preserve that with an appropriate typeclass. I'd ideally like to see this engineered in two parts -- a "cgit-like" typeclass interface that allows access to the environment but is agnostic as to response type, so that comet-style and other apps that take special advantage of enumerators can be built on top of it as well as apps that simply perform lazy writes; and the lower-level enumerator interface. This ideally would let the higher-level interface be built over any stack at all (i.e. STM-based as well, or even a pure stack), while the lower level interface that calls it is some glue of the given constant type in the IO monad. This would be of great help to hvac. There's also the fact that this could be designed ground-up with greater bytestring use, but that doesn't seem immense to me. Outside of this, I'm not quite sure what else CGIT lacks. I'm with Chris Smith's arguments as to the headers question, and it seems to me that dicts are best done using MVar-style primitives. I'm a bit at sea as to why the queryString is here just represented as a bytestring -- is it seriously an issue that some apps may want to use it other than in the standard parsed way? Is the idea here that lib functions would fill in and be shared among frameworks? On the other hand, seperating GET and POST vars is a good idea, and its a shame that CGIT doesn't allow this. The openness here seems in part based on the desire to keep different forms of file upload handling available. However, the work that oleg did with regards to CGI also seems promising -- i.e., rather than using an enumerator, simply taking advantage of laziness to unpack the input stream into a lazy dictionary. Regards, S. On Apr 13, 2008, at 7:59 AM, Johan Tibell wrote:
Good day hackers,
The Python community have been successful in standardizing an interface between web server and applications or frameworks resulting in users having more control over their web stack by being able to pick frameworks independently from web servers, and vice versa. I propose we try to do the same for Haskell. I've written half a draft for a Haskell version of Python's PEP 333 [1]. If you're interested in taking part in this effort please read through the Python spec first (as it is way more complete and you can understand this proposal better by reading it, I've skipped some important issues in my first draft) and then go read the Haskell spec [2]. I'm particularly interesting in feedback regarding:
* Doing in this way won't work as it violates HTTP/CGI spec part X, Y and Z (the Python spec takes lots of things from the CGI spec including naming and semantics). * My server/framework could never provide/be run under this interface. * This interface has bad performance by design. * Using a different set of data types would work better.
The spec needs to be extended to cover all the corners of HTTP. Some parts need to be motivated better. It is easier for me to motivate things if people would tell me what parts are badly motivated.
Note: I'm open to a complete rewrite if needed. I'm not wedded to the current design and/or wording. In fact parts of the wording is borrowed from the Python spec. The parts with bad grammar are all mine.
1. http://www.python.org/dev/peps/pep-0333/ 2. http://www.haskell.org/haskellwiki/WebApplicationInterface
-- Johan _______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Haskell works fine with both the FastCGI and SCGI protocols (There are libraries floating around for both), I have found them much nicer than any mod_* web server plugin in general. John -- John Meacham - ⑆repetae.net⑆john⑈

I am very interested in this work. One thing missing is support for all HTTP methods, not just those in RFC 2616. As-is, something like WebDAV cannot be implemented. That probably means requestMethod should be a (Byte)String. What about something like sendfile(2) available on some platforms? To allow the server to make use of such optimizations, how about an optional alternative to the enumerator? Perhaps the app can optionally pass a path or an open fd back to the server in place of an enumerator, which allows servers to do any kind of buffering optimizations, etc, that they may know about, as well as using any platform-specific optimizations like sendfile. -- Michaeljohn Clement

First, apologies for not responding earlier. I spent my week at a conference in Austria. Second, thanks for all the feedback! I thought I go through some of my thoughts on the issues raised. Just to try to reiterate the goals of this effort: * To provide a common, no frills interface between web servers and applications or frameworks to increase choice for application developers. * To make that interface easy enough to implement so current web servers and frameworks will implement it. This is crucial for it being adopted. * Avoid design decisions that would limit the number of frameworks that can use the interface. One example of a limiting decisions would be one that limits the maximal possible performance by using e.g. inefficient data types. I'll try to start with what seems to be the easier issues. sendfile(2) support =================== I would like see this supported in the interface. I didn't include it in the first draft as I didn't have a good idea of where to put it. One idea would be to add the following field to the Environment record: sendfile :: Maybe (FD -> IO ()) Possibly with additional parameters as needed. The reason that sendfile needs to be included in the environment instead of just a binding to the C function is that the Socket used for the connection is hidden from the application side and its use is abstracted by the input and output enumerators. The other suggested solution (to return either an Enumerator or a file descriptor) might work better. I just wanted to communicate that I think it should be included. Extension HTTP methods ====================== I did have extension methods in mind when I wrote the draft but didn't include it. I see two possible options. 1. Change the HTTP method enumeration to: data Method = Get | ... | ExtensionMethod ByteString 2. Treat all methods as bytestrings: type Method = ByteString This treatment touches on the discussion on typing further down in this email. I still haven't thought enough about the consequences (if indeed there are any of any importance) of the two approaches. The Enumerator type =================== To recap, I proposed the following type for the Enumerator abstraction: type Enumerator = forall a. (a -> ByteString -> IO (Either a a)) -> a -> IO a The IO monad is a must both in the return type of the Enumerator function and in the iteratee function (i.e. the first parameter of the enumartor). IO in the return type of the enumerator is a must since the server must perform I/O (i.e. reading from the client socket) to provide the input and the application might need to perform I/O to create the response. The appearance of the IO monad in the iteratee functions is an optimization. It makes it possible for the server or application to act immediately when a chunk of data is received. This saves memory when large files are being sent as they can be written to disk/network immediately instead of being cached in memory. There are some different design (and possibly performance trade-offs) that could be made. The current enumerator type can be viewed as an unrolled State monad suggesting that it would be possible to change the type to: type Enumerator = forall t. MonadTrans t => (ByteString -> t IO (Either a a)) -> t IO a which is a more general type allowing for an arbitrary monad stack. Some arguments against doing this: * The unrolled state version is analogous to a left fold (and can indeed be seen as one) and should thus be familiar to all Haskell programmers. * A, possibly unfounded, worry I have is that it might be hard to optimize way the extra abstraction layer putting a performance tax on all applications, whether they use the extra flexibility or not. It would be great if any of the Takusen authors (or Oleg since he wrote the enumerator paper) could comment on this. Note: I haven't thought this one through. It was suggested to me on #haskell and I thought I should at least bring it up. Extra environment variables =========================== I've intended all along to include a field for remaining, optional extra pieces of information taken from e.g. the web server, the shell, etc. I haven't come up with an good name for this field by the idea is to add another field to the Environment: data Environment = Environment { ... , extraEnvironment :: [(ByteString, ByteString)] } Typing and data types ===================== Most discussions seem to, perhaps unsurprisingly, have centered around the use of data types and typing in general. Let me start by giving an assumptions I've used when writing this draft: Existing frameworks already have internal representations of the request URL, headers, etc. Changing these would be costly. Even if this was done I don't think it is possible to pick any one type that all frameworks could use to represent an HTTP requests or even parts of a request. Different frameworks need different types. Let me as an example use the Last-Modified header field. Assume we used named record fields for all headers: data Headers = Headers { ... , lastModified :: Maybe ??? } There is no type we could use for ??? that would be useful for all frameworks. There are several possible DateTime types possible with different design trade-offs. On the other hand, all frameworks likely already have a function to convert raw bytes to whatever internal representation used in that particular framework. Trying to provide more structured types that bytestrings appears to have two drawbacks: 1. It adds boiler plate type conversion code. No benefit is gained by the extra typing. Defining more types in the WAI interface adds complexity to the interface. 2. It adds an unnecessary performance penalty. My suggestions is this: We use a minimal number of types in the interface and leave it up to higher levels to add these. Summary ======= I suggest that the overall design principle should be this: Give a data type (e.g. Environment) with a minimal amount of structure corresponding to the one given in the HTTP RFC plus some extra optional environment provided by the web server and the environment (e.g. shell) in which it is run. I suggest we leave the raw bytestrings in the interface as interpreting them is best done by the framework. This also lends itself to an efficient implementation as the bytestrings in the environment could just be substrings (an O(1) operation) of the raw input read from the socket. Let me make a slight reservation here and say that I might want to split the URL into two parts (e.g. SCRIPT_NAME and PATH_INFO, like in CGI and WSGI). The reason for doing this is that it makes it much easier to nest applications by having each layer consuming one part of the URL and leave the rest to the nested application. For example, consider the task of writing an URL dispatcher that picks different applications depending on the URL prefix: storeApp, adminApp, urlMap :: Application urlMap = mkUrlMap [("/admin", adminApp) ,("/store", storeApp) ] serve :: Application -> IO () -- Provided by the web server. main = serve urlMap When a request for /store/items/1 reaches the URL mapper application it consumes the initial prefix (and puts it in scriptName) and leaves the remaining URL part in pathInfo. adminApp or storeApp can then use what's left in pathInfo to do further dispatching (to a handler function for example). Phew, this turned into a longer email than I thought. If I forgot to respond to any points raised please don't be afraid to raise them again.
participants (8)
-
Adam Langley
-
Chris Smith
-
Daniel McAllansmith
-
Daniel Yokomizo
-
Johan Tibell
-
John Meacham
-
Michaeljohn Clement
-
Sterling Clover