Mark, thanks for the response, it's very well thought out. Let me state two things first to explain some of my design decisions.

Firstly, I'm shooting for lowest-common-denominator here. Right now, I see that as the intersection between the CGI backend and a standalone server backend; I think anything contained in both of those will be contained in all other backends. If anyone has a contrary example, I'd be happy to see it.

Secondly, the WAI is *not* designed to be "user friendly." It's designed to be efficient and portable. People looking for a user-friendly way to write applications should be using some kind of frontend, either a framework, or something like hack-frontend-monadcgi.

That said, let's address your specific comments.


On Mon, Jan 18, 2010 at 8:54 AM, Mark Lentczner <markl@glyphic.com> wrote:
I like this project! Thanks for resurrecting it!

Some thoughts:

Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor
       | Method B.ByteString

Seems logical to me.
 
Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here.

Going to the standalone vs CGI example: in a CGI script, scriptName is a well defined variable. However, it has absolutely no meaning to a standalone handler. I think we're just feeding rubbish into the system. I'm also not certain how one could *use* scriptName in any meaningful manner, outside of trying to reconstruct a URL (more on this topic below).
 
The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications.

I think it's impossible to ever reconstruct a URL for a CGI application. I've tried it; once you start dealing with mod_rewrite, anything could happen. Given that I think we should encourage users to make pretty URLs via mod_rewrite, I oppose inserting such a function. When I need this kind of information (many of my web apps do), I've put it in a configuration file.

However, I don't think it's a good idea to hide information that is universal to all webapps. urlScheme in particular seems very important to me; for example, maybe when serving an app over HTTPS you want to use a secure static-file server as well. Frankly, I don't have a use case for serverName and serverPort that don't involve reconstructing URLs, but my gut feeling is better to leave it in the protocol in case it does have a use case.
 
The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this:
       data HttpVersion = Http09 | Http10 | Http11

I had not thought of that at all, and I like it. However, do we want to hard-code in all possible HTTP versions? In theory, there could be more standards in the future. Plus, isn't Google currently working on a more efficient approach to HTTP that would affect this?
 
Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like:

       [(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]

Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things:

       [(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]

This approach would make WAI much more top-heavy and prone to becoming out-of-date. I don't oppose having this module in a separate package, but I want to keep WAI itself as lite as possible.
 
Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response.

As I stated at the beginning of this response, there should be a framework or frontend sitting between WAI and the application. And given that the actual data on the wire will be represented as a stream of bytes, I'd rather stick with that.

For header names, in particular, it might be beneficial (and faster) to treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor.

Same comment of top-heaviness.
 
Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings.

Thanks for that information, I was unaware. However, I think it still makes sense to keep WAI as low-level as possible, which would mean a sequence of bytes.

Michael