
Following up on the previous thread, I've started a github project for some ideas of a web application interface. It's borrowing from both Hyena and Hack, with a few of its own ideas. The project is available at http://github.com/snoyberg/wai, and the Network.Wai module is available at http://github.com/snoyberg/wai/blob/master/Network/Wai.hs. The repository also includes a port of hack-handler-simpleserver, and an incredibly simple webapp to demonstrate usage. I intend to make the demonstration slightly more sophisticated. Finally, the repository is now yet cabalized. I consider this currently to be a straw-man proposal, intended to highlight the issues of contention that may arise. It would be wonderful is we could get the major players in the Haskell web space to get behind a single WAI. The entire Network.Wai module right now weighs in at only 74 lines, so I do not consider this to be a heavy-weight proposal. Here as some design notes: - Most important point: RequestBody and ResponseBody. I will explain below. - I've renamed "Env" in Hack and "Environment" in Hyena to "Request." This seems more consistent with other technologies out there. However, I have no feelings on this subject at all, and can easily bend to public demand. - I've stuck with UrlScheme from Hack, while Hyena called it protocol. Similar, the RequestMethod constructors are ALLCAPS like Hack, unlike Hyena's Uppercase. Once again, no strong feelings. - I've sided with Hyena as far as making all representations in ByteString. Current exception is remoteHost, which is a Hack-only variable in any event. - Instead of representing the response as a tuple ala Hyena, created a data type like Hack. - The only dependency for this module is bytestring. It might be tempting to represent RequestBody and ResponseBody with a ReaderT IO monad, but this would introduce a dependency on either mtl or transformers, which I would consider a Very Bad Idea. The main complaint against Hack is its lack of an enumerator interface for the request and response body. However, this simply words the complaint incorrectly; I don't think anyone is married to the need of an enumerator. Rather, we want to be able to efficiently handle arbitrarily lengthed content in constant space, without necesarily resorting to unsafeInterleaveIO (ie, lazy I/O). There are a number of issues with left-fold enumerators IMO. It is basically promoting an inversion of control. This may be often times valuable. However, to make this the *only* interface precludes other use cases. The most basic one I ran into was wanting to interleave with read processes. I do not mean to say that it's impossible to interleave reads in such a manner, but I think it's more natural in the approach advocated by wai. I consider RequestBody and ResponseBody to be mirroring the CGI protocol. Essentially, each handler (CGI, simpleserver, FastCGI, happstack server, etc) will define data types which instanciate RequestBodyClass and ResponseBodyClass. RequestBodyClass provides a single method, receiveByteString, to extract a chunk of data from the request body. ResponseBodyClass provides (currently) three methods, for sending strict bytestrings, lazy bytestrings, and files. While default implementations are provided for the last two based on the first, implementations can provide more efficient versions of them if desired. For example, sendFile might be replaced by a system call to sendfile. Let me know your thoughts. I'm purposely leaving out many of my reasons for the decisions I've made for brevity, since this e-mail is long enough as is. I'm happy to answer any questions as to why I went in a certain direction. It's also possible that I simply overlooked a detail. Michael

I like this project! Thanks for resurrecting it! Some thoughts: Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor | Method B.ByteString Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here. The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications. The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this: data HttpVersion = Http09 | Http10 | Http11 Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like: [(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")] Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things: [(Hdr.contentType, Hdr.mimeTextHtmlUtf8)] Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response. For header names, in particular, it might be beneficial (and faster) to treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor. Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings. - Mark Mark Lentczner http://www.ozonehouse.com/mark/ IRC: mtnviewmark

Mark, thanks for the response, it's very well thought out. Let me state two
things first to explain some of my design decisions.
Firstly, I'm shooting for lowest-common-denominator here. Right now, I see
that as the intersection between the CGI backend and a standalone server
backend; I think anything contained in both of those will be contained in
all other backends. If anyone has a contrary example, I'd be happy to see
it.
Secondly, the WAI is *not* designed to be "user friendly." It's designed to
be efficient and portable. People looking for a user-friendly way to write
applications should be using some kind of frontend, either a framework, or
something like hack-frontend-monadcgi.
That said, let's address your specific comments.
On Mon, Jan 18, 2010 at 8:54 AM, Mark Lentczner
I like this project! Thanks for resurrecting it!
Some thoughts:
Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor | Method B.ByteString
Seems logical to me.
Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here.
Going to the standalone vs CGI example: in a CGI script, scriptName is a well defined variable. However, it has absolutely no meaning to a standalone handler. I think we're just feeding rubbish into the system. I'm also not certain how one could *use* scriptName in any meaningful manner, outside of trying to reconstruct a URL (more on this topic below).
The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications.
I think it's impossible to ever reconstruct a URL for a CGI application. I've tried it; once you start dealing with mod_rewrite, anything could happen. Given that I think we should encourage users to make pretty URLs via mod_rewrite, I oppose inserting such a function. When I need this kind of information (many of my web apps do), I've put it in a configuration file.
However, I don't think it's a good idea to hide information that is universal to all webapps. urlScheme in particular seems very important to me; for example, maybe when serving an app over HTTPS you want to use a secure static-file server as well. Frankly, I don't have a use case for serverName and serverPort that don't involve reconstructing URLs, but my gut feeling is better to leave it in the protocol in case it does have a use case.
The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this: data HttpVersion = Http09 | Http10 | Http11
I had not thought of that at all, and I like it. However, do we want to hard-code in all possible HTTP versions? In theory, there could be more standards in the future. Plus, isn't Google currently working on a more efficient approach to HTTP that would affect this?
Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like:
[(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]
Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things:
[(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]
This approach would make WAI much more top-heavy and prone to becoming out-of-date. I don't oppose having this module in a separate package, but I want to keep WAI itself as lite as possible.
Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response.
As I stated at the beginning of this response, there should be a framework or frontend sitting between WAI and the application. And given that the actual data on the wire will be represented as a stream of bytes, I'd rather stick with that.
For header names, in particular, it might be beneficial (and faster) to
treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor.
Same comment of top-heaviness.
Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings.
Thanks for that information, I was unaware. However, I think it still makes sense to keep WAI as low-level as possible, which would mean a sequence of bytes.
Michael

Just as an update, I've made the following changes to my WAI git repo (
http://github.com/snoyberg/wai):
* I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe
ByteString)". This is a good example of tradeoffs versus the enumerator
approach (see below).
* This might just be bikeshedding, but renamed RequestMethod to Method to
make names slightly shorter and more consistent.
* I implemented Mark's suggestions of adding support for arbitrary request
methods and information on HTTP version.
I've been having some off-list discussions about WAI, and have a few issues
to bring up. The first is relatively simple: what do we do about consuming
the entire request body? Do we leave that as a task to the application, or
should the server ensure that the entire request body is consumed?
Next, I have made the ResponseBodyClass typeclass specifically with the goal
of allowing optimizations for lazy bytestrings and sending files. The former
seems far-fetched; the latter provides the ability to use a sendfile system
call instead of copying the file data into memory. However, in the presence
of gzip encoding, how useful is this optimization?
Finally, there is a lot of discussion going on right now about enumerators.
The question is whether the WAI protocol should use them. There are two
places where they could replace the current offering: request body and
response body.
In my opinion, there is no major difference between the Hyena definition of
an enumerator and the current response body sendByteString method. The
former provides two extra features: there's an accumulating parameter passed
around, and a method for indicating early termination. However, the
accumulating parameter seems unnecesary to me in general, and when needed we
can accomplish the same result with MVars. Early termination seems like
something that would be unusual in the response context, and could be
handled with exceptions.
For the request body, there is a significant difference. However, I think
that the current approach (called imperative elsewhere) is more in line with
how most people would expect to program. At the same time, I believe there
is no performance issue going either way, and am open to community input.
Michael
On Mon, Jan 18, 2010 at 1:48 PM, Michael Snoyman
Mark, thanks for the response, it's very well thought out. Let me state two things first to explain some of my design decisions.
Firstly, I'm shooting for lowest-common-denominator here. Right now, I see that as the intersection between the CGI backend and a standalone server backend; I think anything contained in both of those will be contained in all other backends. If anyone has a contrary example, I'd be happy to see it.
Secondly, the WAI is *not* designed to be "user friendly." It's designed to be efficient and portable. People looking for a user-friendly way to write applications should be using some kind of frontend, either a framework, or something like hack-frontend-monadcgi.
That said, let's address your specific comments.
On Mon, Jan 18, 2010 at 8:54 AM, Mark Lentczner
wrote: I like this project! Thanks for resurrecting it!
Some thoughts:
Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor | Method B.ByteString
Seems logical to me.
Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here.
Going to the standalone vs CGI example: in a CGI script, scriptName is a well defined variable. However, it has absolutely no meaning to a standalone handler. I think we're just feeding rubbish into the system. I'm also not certain how one could *use* scriptName in any meaningful manner, outside of trying to reconstruct a URL (more on this topic below).
The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications.
I think it's impossible to ever reconstruct a URL for a CGI application. I've tried it; once you start dealing with mod_rewrite, anything could happen. Given that I think we should encourage users to make pretty URLs via mod_rewrite, I oppose inserting such a function. When I need this kind of information (many of my web apps do), I've put it in a configuration file.
However, I don't think it's a good idea to hide information that is universal to all webapps. urlScheme in particular seems very important to me; for example, maybe when serving an app over HTTPS you want to use a secure static-file server as well. Frankly, I don't have a use case for serverName and serverPort that don't involve reconstructing URLs, but my gut feeling is better to leave it in the protocol in case it does have a use case.
The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this: data HttpVersion = Http09 | Http10 | Http11
I had not thought of that at all, and I like it. However, do we want to hard-code in all possible HTTP versions? In theory, there could be more standards in the future. Plus, isn't Google currently working on a more efficient approach to HTTP that would affect this?
Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like:
[(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]
Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things:
[(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]
This approach would make WAI much more top-heavy and prone to becoming out-of-date. I don't oppose having this module in a separate package, but I want to keep WAI itself as lite as possible.
Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response.
As I stated at the beginning of this response, there should be a framework or frontend sitting between WAI and the application. And given that the actual data on the wire will be represented as a stream of bytes, I'd rather stick with that.
For header names, in particular, it might be beneficial (and faster) to
treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor.
Same comment of top-heaviness.
Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings.
Thanks for that information, I was unaware. However, I think it still makes sense to keep WAI as low-level as possible, which would mean a sequence of bytes.
Michael

On Sat, 23 Jan 2010 21:31:47 +0200, Michael Snoyman
Just as an update, I've made the following changes to my WAI git repo ( http://github.com/snoyberg/wai):
* I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe ByteString)". This is a good example of tradeoffs versus the enumerator approach (see below).
* Are you sure that a strict bytestring is fine here? Can the request body be used to upload large data? If so why not use a lazy one, not (only) for its laziness but for being a list of chunks that fits well into memory caches... * I would call ResponseBody a ResponseReceiver given its use and the sigs of the methods. * Why not use sendLazyByteString in sendFile as a default method, this will fix your "TODO" since I believe the chunk size would be a good one. * Maybe a ResponseReceiver Handle instance could be provided. Since it requires no data type definition and would make an orphan instance elsewhere. Maybe a one for sockets would make sense as well.
* This might just be bikeshedding, but renamed RequestMethod to Method to make names slightly shorter and more consistent.
Good for me
* I implemented Mark's suggestions of adding support for arbitrary request methods and information on HTTP version.
Nice
I've been having some off-list discussions about WAI, and have a few issues to bring up. The first is relatively simple: what do we do about consuming the entire request body? Do we leave that as a task to the application, or should the server ensure that the entire request body is consumed?
Good question, is there something in the HTTP spec about this. I don't think so, and I think it would make sense to give up early if you consider the input as garbage.
Next, I have made the ResponseBodyClass typeclass specifically with the goal of allowing optimizations for lazy bytestrings and sending files. The former seems far-fetched; the latter provides the ability to use a sendfile system call instead of copying the file data into memory. However, in the presence of gzip encoding, how useful is this optimization?
It is useful anyway.
Finally, there is a lot of discussion going on right now about enumerators. The question is whether the WAI protocol should use them. There are two places where they could replace the current offering: request body and response body.
In my opinion, there is no major difference between the Hyena definition of an enumerator and the current response body sendByteString method. The former provides two extra features: there's an accumulating parameter passed around, and a method for indicating early termination. However, the accumulating parameter seems unnecesary to me in general, and when needed we can accomplish the same result with MVars. Early termination seems like something that would be unusual in the response context, and could be handled with exceptions.
IORefs could be sufficient (instead of MVars) but this seems a bit ugly compared to the accumulator. In the other hand sometimes you don't need the accumulator and so just pass a dump unit. If we live in IO yes exceptions could do that. However the point of the Either type is to remind you that you have two cases to handle.
For the request body, there is a significant difference. However, I think that the current approach (called imperative elsewhere) is more in line with how most people would expect to program. At the same time, I believe there is no performance issue going either way, and am open to community input.
Why an imperative approach would be more in line when using a purely functional language? Regards, -- Nicolas Pouillard http://nicolaspouillard.fr

On Sun, Jan 24, 2010 at 2:38 AM, Nicolas Pouillard < nicolas.pouillard@gmail.com> wrote:
On Sat, 23 Jan 2010 21:31:47 +0200, Michael Snoyman
wrote: Just as an update, I've made the following changes to my WAI git repo ( http://github.com/snoyberg/wai):
* I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe ByteString)". This is a good example of tradeoffs versus the enumerator approach (see below).
* Are you sure that a strict bytestring is fine here? Can the request body be used to upload large data? If so why not use a lazy one, not (only) for its laziness but for being a list of chunks that fits well into memory caches...
Sorry, this is where I should have put in some documentation. The IO (Maybe ByteString) returns chunks of strict bytestring until it encounters the end of the body. If we were to use a lazy bytestring, we would either need lazy I/O or to read everything into memory (which is what we're trying to avoid).
The handler has the prerogative to determine chunk size. * I would call ResponseBody a ResponseReceiver given its use and the sigs
of the methods.
* Why not use sendLazyByteString in sendFile as a default method, this will fix your "TODO" since I believe the chunk size would be a good one.
* Maybe a ResponseReceiver Handle instance could be provided. Since it requires no data type definition and would make an orphan instance elsewhere. Maybe a one for sockets would make sense as well.
Sorry, I added a few more patches since sending this e-mail. I did away with ResponseBody as well, and replaced it with Either FilePath ((ByteString -> IO ()) -> IO ()). This is *very* close to the Hyena version in my opinion, with three differences (I think I've written these elsewhere, so sorry if I'm repeating myself).
1) It provides the option of providing optimized file sending, as per the Happstack sendfile system call. I was concerned at first that we might wish to provide sending multiple files, but I think people will prefer the simplicity that comes without having a typeclass. I'm completely open to revisiting this issue, as I have no strong feelings. 2) There is no "accumulating parameter" as there is with Hyena. In the general case, I don't think it's necesary, and when it is, we can use MVars. 3) There is no built in way to force early termination. I think this is a better approach, since early termination would be an exceptional situation. Forcing the application to check a return value each time would be overhead that would rarely be used, and we can achieve the same effect with an exception. Sorry for not sending this update earlier, but I only finished at about 2:30 last night. I found it difficult to write coherently.
* This might just be bikeshedding, but renamed RequestMethod to Method to
make names slightly shorter and more consistent.
Good for me
* I implemented Mark's suggestions of adding support for arbitrary request methods and information on HTTP version.
Nice
I've been having some off-list discussions about WAI, and have a few issues to bring up. The first is relatively simple: what do we do about consuming the entire request body? Do we leave that as a task to the application, or should the server ensure that the entire request body is consumed?
Good question, is there something in the HTTP spec about this. I don't think so, and I think it would make sense to give up early if you consider the input as garbage.
What do you mean by this? That we don't need to consume that input at all, or that the server should be held responsible for "/dev/null"ing the data?
of allowing optimizations for lazy bytestrings and sending files. The
seems far-fetched; the latter provides the ability to use a sendfile system call instead of copying the file data into memory. However, in the
Next, I have made the ResponseBodyClass typeclass specifically with the goal former presence
of gzip encoding, how useful is this optimization?
It is useful anyway.
Finally, there is a lot of discussion going on right now about enumerators. The question is whether the WAI protocol should use them. There are two places where they could replace the current offering: request body and response body.
In my opinion, there is no major difference between the Hyena definition of an enumerator and the current response body sendByteString method. The former provides two extra features: there's an accumulating parameter passed around, and a method for indicating early termination. However, the accumulating parameter seems unnecesary to me in general, and when needed we can accomplish the same result with MVars. Early termination seems like something that would be unusual in the response context, and could be handled with exceptions.
IORefs could be sufficient (instead of MVars) but this seems a bit ugly compared to the accumulator. In the other hand sometimes you don't need the accumulator and so just pass a dump unit. If we live in IO yes exceptions could do that. However the point of the Either type is to remind you that you have two cases to handle.
that the current approach (called imperative elsewhere) is more in line with how most people would expect to program. At the same time, I believe
For the request body, there is a significant difference. However, I think there
is no performance issue going either way, and am open to community input.
Why an imperative approach would be more in line when using a purely functional language?
Regards,
-- Nicolas Pouillard http://nicolaspouillard.fr
Because I don't think it really *is* an imperative approach. For that matter, enumerators are frankly also an "imperative approach." It's frankly a silly distinction IMO. The question is whether this is a *good* approach. I think passing in an output function fits very nicely with Haskell. The question to me lies more on the request side than the response side. Basically, should the application provide a caller or a callee for reading the request body? Most of the time, the latter is simpler to write I believe. Ha! I finally found the article I'd read a while ago demonstrating this point in C. You can obviously disagree with the sentiment there, but I've found the point to be true in Haskell as well: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html Michael

Michael Snoyman wrote: [--snip--]
Next, I have made the ResponseBodyClass typeclass specifically with the goal of allowing optimizations for lazy bytestrings and sending files. The former seems far-fetched; the latter provides the ability to use a sendfile system call instead of copying the file data into memory. However, in the presence of gzip encoding, how useful is this optimization? [--snip--]
I'm hoping that the "Web" bit in your project title doesn't literally mean that WAI is meant to be restricted to solely serving content to browsers. With that caveat in mind: For non-WWW HTTP servers it can be extremely useful to have sendfile. An example is my Haskell UPnP Media Server (hums) application. It's sending huge files (AVIs, MP4s, etc.) over the network and since these files are already compressed as much as they're ever going to be, gzip would be useless. The CPU load of my hums server went from 2-5% to 0% when streaming files just from switching from a Haskell I/O based solution to proper sendfile. Lack of proper support for sendfile() was indeed one of the reasons that I chose to roll my own HTTP server for hums. I should note that this was quite a while ago and I haven't really gone back to reevaluate that choice -- there's too many HTTP stacks to choose from right now and I don't have the time to properly evaluate them all. For this type of server, response *streaming* is also extremely important for those cases where you cannot use sendfile, so I'd hate to see a standard WAI interface preclude that. (No, lazy I/O is NOT an option -- the HTTP clients in a typical UPnP media client behave so badly that you'll run out of file descriptors in no time. Trust me, I've tried.) Cheers,

On Sun, 24 Jan 2010 12:23:46 +0100, Bardur Arantsson
Michael Snoyman wrote:
[--snip--]
Next, I have made the ResponseBodyClass typeclass specifically with the goal of allowing optimizations for lazy bytestrings and sending files. The former seems far-fetched; the latter provides the ability to use a sendfile system call instead of copying the file data into memory. However, in the presence of gzip encoding, how useful is this optimization? [--snip--]
I'm hoping that the "Web" bit in your project title doesn't literally mean that WAI is meant to be restricted to solely serving content to browsers. With that caveat in mind:
For non-WWW HTTP servers it can be extremely useful to have sendfile. An example is my Haskell UPnP Media Server (hums) application. It's sending huge files (AVIs, MP4s, etc.) over the network and since these files are already compressed as much as they're ever going to be, gzip would be useless. The CPU load of my hums server went from 2-5% to 0% when streaming files just from switching from a Haskell I/O based solution to proper sendfile.
Lack of proper support for sendfile() was indeed one of the reasons that I chose to roll my own HTTP server for hums. I should note that this was quite a while ago and I haven't really gone back to reevaluate that choice -- there's too many HTTP stacks to choose from right now and I don't have the time to properly evaluate them all.
Good reason indeed.
For this type of server, response *streaming* is also extremely important for those cases where you cannot use sendfile, so I'd hate to see a standard WAI interface preclude that. (No, lazy I/O is NOT an option -- the HTTP clients in a typical UPnP media client behave so badly that you'll run out of file descriptors in no time. Trust me, I've tried.)
Is the experiment easily re-doable? I would like to try using safe-lazy-io instead. -- Nicolas Pouillard http://nicolaspouillard.fr

Michael Snoyman wrote:
[--snip--]
Next, I have made the ResponseBodyClass typeclass specifically with the
goal of allowing optimizations for lazy bytestrings and sending files. The former seems far-fetched; the latter provides the ability to use a sendfile system call instead of copying the file data into memory. However, in the presence of gzip encoding, how useful is this optimization?
[--snip--]
I'm hoping that the "Web" bit in your project title doesn't literally mean that WAI is meant to be restricted to solely serving content to browsers. With that caveat in mind:
For non-WWW HTTP servers it can be extremely useful to have sendfile. An example is my Haskell UPnP Media Server (hums) application. It's sending huge files (AVIs, MP4s, etc.) over the network and since these files are already compressed as much as they're ever going to be, gzip would be useless. The CPU load of my hums server went from 2-5% to 0% when streaming files just from switching from a Haskell I/O based solution to proper sendfile.
Lack of proper support for sendfile() was indeed one of the reasons that I chose to roll my own HTTP server for hums. I should note that this was quite a while ago and I haven't really gone back to reevaluate that choice -- there's too many HTTP stacks to choose from right now and I don't have the time to properly evaluate them all.
For this type of server, response *streaming* is also extremely important for those cases where you cannot use sendfile, so I'd hate to see a standard WAI interface preclude that. (No, lazy I/O is NOT an option -- the HTTP clients in a typical UPnP media client behave so badly that you'll run out of file descriptors in no time. Trust me, I've tried.)
Both sendfile and response streaming are in the top priorities in the WAI
On Sun, Jan 24, 2010 at 1:23 PM, Bardur Arantsson

Minor spec question: what should be the defined behavior when an application requests that a file be sent and it does not exist?
participants (4)
-
Bardur Arantsson
-
Mark Lentczner
-
Michael Snoyman
-
Nicolas Pouillard