http-enumerator : Any way to get the headers first?

Hi all, I'm continuing work on the HTTP proxy I'm writing. The absolute bare basics are working with Warp, Wai and http-enumerator as long as the upstream web server doesn't send gzipped or chunked data. For these later two cases, httpe-enumerator helpfully gunzips/unchunks the data. That however causes a problem. If my proxy simply passes the HTTP headers and data it gets from http-enumerator and passes then on to the client, the client barfs because the headers claim the data is gzipped or chunked and the data actually isn't chunked/gzipped. There are a number of possible solutions to this: a) Strip the content/transfer-encoding header and add a content-length header instead. I think this is probably possible with the API as it is, but I haven't figured out how yet. b) Rechunk or re-gzip the data. This seems rather wasteful of CPU resources. c) Modify the Network.Http.Enumerator.http function so that de-chunking/gunzipping is optional. d) Expose the iterHeaders function that is internal to the http-enumerator package so that client code can grab the headers before deciding how to handle the body. Are there any other options I haven't thought of yet?
From the options I have, I actually think d) makes the most sense, Would a patch exposing iterHeaders be accepted?
Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On Mon, May 16, 2011 at 11:23 AM, Erik de Castro Lopo
Hi all,
I'm continuing work on the HTTP proxy I'm writing. The absolute bare basics are working with Warp, Wai and http-enumerator as long as the upstream web server doesn't send gzipped or chunked data. For these later two cases, httpe-enumerator helpfully gunzips/unchunks the data. That however causes a problem.
If my proxy simply passes the HTTP headers and data it gets from http-enumerator and passes then on to the client, the client barfs because the headers claim the data is gzipped or chunked and the data actually isn't chunked/gzipped.
There are a number of possible solutions to this:
a) Strip the content/transfer-encoding header and add a content-length header instead. I think this is probably possible with the API as it is, but I haven't figured out how yet.
b) Rechunk or re-gzip the data. This seems rather wasteful of CPU resources.
c) Modify the Network.Http.Enumerator.http function so that de-chunking/gunzipping is optional.
d) Expose the iterHeaders function that is internal to the http-enumerator package so that client code can grab the headers before deciding how to handle the body.
Are there any other options I haven't thought of yet?
From the options I have, I actually think d) makes the most sense, Would a patch exposing iterHeaders be accepted?
Cheers, Erik
Short answer is that I'm fine exposing iterHeaders, as long as we put a big fat "advanced users only" comment on it. I agree with you that (b) seems like a bad idea. (a) is definitely possible, if a bit tricky, but it defeats the whole purpose of chunking and gzipping, so it probably shouldn't be considered. I would guess that (c) is really the best option, though I'm guessing you shied away from it because it involved more substantial changes to http-enumerator. Maybe we should consider adding an extra httpAdvanced function that takes additional settings, such as whether or not to automatically de-chunk/de-gzip. I wouldn't be surprised if we come up with more such cases in the future. Michael

Michael Snoyman wrote:
On Mon, May 16, 2011 at 11:23 AM, Erik de Castro Lopo
wrote: c) Modify the Network.Http.Enumerator.http function so that de-chunking/gunzipping is optional.
d) Expose the iterHeaders function that is internal to the http-enumerator package so that client code can grab the headers before deciding how to handle the body.
Are there any other options I haven't thought of yet?
From the options I have, I actually think d) makes the most sense, Would a patch exposing iterHeaders be accepted?
Short answer is that I'm fine exposing iterHeaders, as long as we put a big fat "advanced users only" comment on it. I agree with you that (b) seems like a bad idea. (a) is definitely possible, if a bit tricky, but it defeats the whole purpose of chunking and gzipping, so it probably shouldn't be considered.
I would guess that (c) is really the best option, though I'm guessing you shied away from it because it involved more substantial changes to http-enumerator.
I shied away from c) for two reasons: - I thought exposing iterHeaders would have other uses as well. - How do we know that we are at the end of the request body? With chunked/gzipped data (and no content-length header) the body decoder figures out where the end is. I'll have a play with c) and see how it behaves. My idea is to simply be to add a rawData :: Bool field to the Request data structure. If that element is True then no dechunking or gunzipping will occur. Once I've tried this relatively trivial change I will know for sure what the happens at the end of the body.
I wouldn't be surprised if we come up with more such cases in the future.
Indeed. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Erik de Castro Lopo wrote:
I shied away from c) for two reasons:
- I thought exposing iterHeaders would have other uses as well.
- How do we know that we are at the end of the request body? With chunked/gzipped data (and no content-length header) the body decoder figures out where the end is.
I'll have a play with c) and see how it behaves. My idea is to simply be to add a
rawData :: Bool
field to the Request data structure. If that element is True then no dechunking or gunzipping will occur. Once I've tried this relatively trivial change I will know for sure what the happens at the end of the body.
This seems to work quite nicely. My proxy now works for a bunch of cases where it was failing before. I'll sumbit a pull request on hithub for this. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On 05/16/11 18:35, Michael Snoyman wrote:
On Mon, May 16, 2011 at 11:23 AM, Erik de Castro Lopo
mailto:mle%2Bhs@mega-nerd.com> wrote: c) Modify the Network.Http.Enumerator.http function so that de-chunking/gunzipping is optional.
I would guess that (c) is really the best option, though I'm guessing you shied away from it because it involved more substantial changes to http-enumerator. Maybe we should consider adding an extra httpAdvanced function that takes additional settings, such as whether or not to automatically de-chunk/de-gzip. I wouldn't be surprised if we come up with more such cases in the future.
Well, one whacky-but-nifty related idea I've had is that if you're outputting a gzip'ed page, but you've got some expensive chunk of the page in the middle that you've cached, it is technically possible to actually cache that chunk pre-gzip'ed and output it directly into the gzip'ed output, or at the output layer, de-gzip it for a non-gzip'ed page if necessary, which since that is more rare might be a win. You can do this because at any time you can flush a gzip stream and simply start sending another one, though you lose your dictionary when you do that so you wouldn't want to do this below some cutoff size. Whether or not this is ever a good idea, I don't know; I've never had anything like the infrastructure it would take to benchmark it. It could be anything from a surprisingly cool addition that few other frameworks could compete with, to a net loss. (Or both, depending on how it is used.)
participants (3)
-
Erik de Castro Lopo
-
Jeremy Bowers
-
Michael Snoyman