Advice on implementing a web proxy

Hi all, I'm working on a simple web proxy. I have the proxying of HTTP working correctly (as least as far as I have tested it) and would like to work on proxying HTTPS. The way HTTPS proxying works is as follows: a) Client sends "CONNECT host:port HTTP/1.1" to the proxy in clear text. b) Proxy makes a connection to host:port and if successful sends "HTTP/1.0 200 Connection established" to the client. c) The proxy then blindly transfers bytes from the client to the server and bytes from server to the client. d) The client does TLS negotiation over the bi-directional pipe established and maintained by the proxy. The git repo containing the code for my proxy is here: https://github.com/erikd/simple-web-proxy and the core of the actual proxy is here: https://github.com/erikd/simple-web-proxy/blob/master/src/simple-web-proxy.h... The proxying function should have a type signature of: sslConnectRequest :: ByteString -> Int -> Wai.Request -> Proxy Wai.Response where the ByteString contains the host name and the Int the port number. My plans for the sslConnectRequest function is for it to open a socket connection to the server and then wrap that socket inside and enumerator. Is that a reasonable plan? Is there a better way? Any existing code that does something similar for me to hack? Clues? Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Erik, I'm not sure that I understand your question. But I have one concern from your code. You use parseURL. It returns Request whose requestBody is RequestBodyLBS L.empty. This means that your code does not relay HTTP body at all. I'm now implementing a reverse proxy. I tested several ways to realy HTTP body but only one solution which works is to get whole HTTP body as ByteString and specify it to RequestBodyLBS. This is store-and-forward, not pipelining. My code can be found: https://github.com/kazu-yamamoto/wai-app-file-cgi/blob/master/Network/Wai/Ap... I'm wondering if Enumerator can implement pipelining... --Kazu
Hi all,
I'm working on a simple web proxy. I have the proxying of HTTP working correctly (as least as far as I have tested it) and would like to work on proxying HTTPS. The way HTTPS proxying works is as follows:
a) Client sends "CONNECT host:port HTTP/1.1" to the proxy in clear text.
b) Proxy makes a connection to host:port and if successful sends "HTTP/1.0 200 Connection established" to the client.
c) The proxy then blindly transfers bytes from the client to the server and bytes from server to the client.
d) The client does TLS negotiation over the bi-directional pipe established and maintained by the proxy.
The git repo containing the code for my proxy is here:
https://github.com/erikd/simple-web-proxy
and the core of the actual proxy is here:
https://github.com/erikd/simple-web-proxy/blob/master/src/simple-web-proxy.h...
The proxying function should have a type signature of:
sslConnectRequest :: ByteString -> Int -> Wai.Request -> Proxy Wai.Response
where the ByteString contains the host name and the Int the port number.
My plans for the sslConnectRequest function is for it to open a socket connection to the server and then wrap that socket inside and enumerator.
Is that a reasonable plan? Is there a better way? Any existing code that does something similar for me to hack? Clues?
Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Kazu Yamamoto (山本和彦) wrote:
You use parseURL. It returns Request whose requestBody is RequestBodyLBS L.empty. This means that your code does not relay HTTP body at all.
You are correct. So far I have really only tested this with http GET requests. Fixing that should be trivial.
I'm now implementing a reverse proxy. I tested several ways to realy HTTP body but only one solution which works is to get whole HTTP body as ByteString and specify it to RequestBodyLBS. This is store-and-forward, not pipelining.
My code can be found: https://github.com/kazu-yamamoto/wai-app-file-cgi/blob/master/Network/Wai/Ap...
I'm wondering if Enumerator can implement pipelining...
It definitely can. Have a a look at the function serveRequest in this file: https://github.com/erikd/simple-web-proxy/blob/master/src/simple-web-proxy.h... On a machine with 4Gig of RAM, I have simultaneous downloaded 4 copies of a 4Gig DVD ISO through my proxy, with memory usage never going above about 10%. I'm going right back to basics to try and get a really good understanding of Enumerators and Iteratees. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On a machine with 4Gig of RAM, I have simultaneous downloaded 4 copies of a 4Gig DVD ISO through my proxy, with memory usage never going above about 10%.
I guess you are talking about relaying for body of HTTP response. For this case, you are right. But I'm talking about relaying for body of HTTP request. In other words, my concern is about uploading. In my company, uploading files whose size is some giga bytes is quite common. I don't know how to relay a uploading huge file with a fixed-size buffer with Warp. --Kazu

On Mon, Nov 28, 2011 at 9:36 PM, Kazu Yamamoto
On a machine with 4Gig of RAM, I have simultaneous downloaded 4 copies of a 4Gig DVD ISO through my proxy, with memory usage never going above about 10%.
I guess you are talking about relaying for body of HTTP response. For this case, you are right.
But I'm talking about relaying for body of HTTP request. In other words, my concern is about uploading. In my company, uploading files whose size is some giga bytes is quite common. I don't know how to relay a uploading huge file with a fixed-size buffer with Warp.
I've also tried to glue wai and http-enumerator together to make a streamlined proxy server, but i think there is no easy way to make uploading streamlined with current interface. But with HEAD version of warp, you can use `settingsIntercept`[1] to gen control to the socket, with the raw socket you can make a enumerator based on it and pass to http-enumerator, and make upload streaming works. [1]. https://github.com/yesodweb/wai/blob/master/warp/Network/Wai/Handler/Warp.hs...
--Kazu
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

I've also tried to glue wai and http-enumerator together to make a streamlined proxy server, but i think there is no easy way to make uploading streamlined with current interface. But with HEAD version of warp, you can use `settingsIntercept`[1] to gen control to the socket, with the raw socket you can make a enumerator based on it and pass to http-enumerator, and make upload streaming works.
In my understanding, settingsIntercept is for web socket which starts with HTTP but switches to new transport (not application) layer. My target is HTTP relay. If I use settingsIntercept, I cannot make use of benefits of Warp. Regards, --Kazu

Kazu Yamamoto (山本和彦) wrote:
I guess you are talking about relaying for body of HTTP response. For this case, you are right.
But I'm talking about relaying for body of HTTP request.
Yes, I misread your repsonse.
In other words, my concern is about uploading. In my company, uploading files whose size is some giga bytes is quite common. I don't know how to relay a uploading huge file with a fixed-size buffer with Warp.
I too would very much like to see a solution to this issue. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Erik,
In other words, my concern is about uploading. In my company, uploading files whose size is some giga bytes is quite common. I don't know how to relay a uploading huge file with a fixed-size buffer with Warp.
I too would very much like to see a solution to this issue.
It seems to me that your http-proxy package solves this problem. Is my understanding correct? Your package has some duplicated code from Warp. Is it possible to merge the functionality to Warp and to provide the proxy feature which is re-usable from WAI applications? I would like to implement robust reverse-proxy on Warp. Thanks. --Kazu

Kazu Yamamoto (山本和彦) wrote:
In other words, my concern is about uploading. In my company, uploading files whose size is some giga bytes is quite common. I don't know how to relay a uploading huge file with a fixed-size buffer with Warp.
I too would very much like to see a solution to this issue.
It seems to me that your http-proxy package solves this problem. Is my understanding correct?
Unfortunately, no. This is listed as one of the know problems in the readme file: https://github.com/erikd/http-proxy/blob/master/README.txt I my opinion, the way to fix this is to modify and extend Wai. I too would like to see this problem solved but its lower priority than some of the other things I want to address.
Your package has some duplicated code from Warp.
Indeed.
Is it possible to merge the functionality to Warp and to provide the proxy feature which is re-usable from WAI applications?
I'm not sure. Most of the duplicated code is utility functions. If they were may available in the Warp API, then I could use them, but I think Warp (being a web server) and http-proxy are sufficiently different to make it difficult to merge them.
I would like to implement robust reverse-proxy on Warp.
So you need the standard features of Warp as well as reverse-proxy features? Do you need to reverse proxy HTTPS as well as HTTP? I found that perforiming proxying of HTTP relatively easy and that doing it for HTTPS inside Warp was impossible. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Erik,
Unfortunately, no. This is listed as one of the know problems in the readme file:
Oh my.
So you need the standard features of Warp as well as reverse-proxy features?
Yes. I want to build all my web site only in Haskell. Now mighttpd (ineffectively) acts as reverse-proxy for Yesod apps behind as well as handles static files.
Do you need to reverse proxy HTTPS as well as HTTP?
Currently no. I will tackle this topic when my current job finished. Thank you for your reply. --Kazu

On Thu, Dec 8, 2011 at 6:48 AM, Erik de Castro Lopo
Kazu Yamamoto (山本和彦) wrote:
In other words, my concern is about uploading. In my company, uploading files whose size is some giga bytes is quite common. I don't know how to relay a uploading huge file with a fixed-size buffer with Warp.
I too would very much like to see a solution to this issue.
It seems to me that your http-proxy package solves this problem. Is my understanding correct?
Unfortunately, no. This is listed as one of the know problems in the readme file:
https://github.com/erikd/http-proxy/blob/master/README.txt
I my opinion, the way to fix this is to modify and extend Wai. I too would like to see this problem solved but its lower priority than some of the other things I want to address.
Can you point me to the code that demonstrates this?
Your package has some duplicated code from Warp.
Indeed.
Is it possible to merge the functionality to Warp and to provide the proxy feature which is re-usable from WAI applications?
I'm not sure. Most of the duplicated code is utility functions. If they were may available in the Warp API, then I could use them, but I think Warp (being a web server) and http-proxy are sufficiently different to make it difficult to merge them.
I would like to implement robust reverse-proxy on Warp.
So you need the standard features of Warp as well as reverse-proxy features? Do you need to reverse proxy HTTPS as well as HTTP? I found that perforiming proxying of HTTP relatively easy and that doing it for HTTPS inside Warp was impossible.
Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Hello Michael,
I my opinion, the way to fix this is to modify and extend Wai. I too would like to see this problem solved but its lower priority than some of the other things I want to address.
Can you point me to the code that demonstrates this?
Let me explain my understanding. Consider the following client-server. ----> clinet server <---- In this case, server handles a pair of input and ouput. This fits Iteratee model. But consider proxy: ----> ----> clinet proxy server <---- <---- This proxy must handle tow pairs of input ant ouput. In Haskell netowrk programming, we typically prepares two user threads. Warp uses one thread and one Iteratee. So, only downloading can be streaming and uploading results in store-and-forward. --Kazu

Michael Snoyman wrote:
Unfortunately, no. This is listed as one of the know problems in the readme file:
https://github.com/erikd/http-proxy/blob/master/README.txt
I my opinion, the way to fix this is to modify and extend Wai. I too would like to see this problem solved but its lower priority than some of the other things I want to address.
Can you point me to the code that demonstrates this?
Hmm, I came to this conclusion some time ago and the actual code seems to be different from how I remember it. However, there is still a problem. The problem is that the Wai.Request structure http://hackage.haskell.org/packages/archive/wai/0.4.2/doc/html/src/Network-W... has no requestBody field. A HTTP POST operation contains HTTP request headers (with a Content-Length header) and a request body. The Content-Type field of the header can specify things like text/xml or application/x-www-form-urlencoded. For Kazu Yamamoto's case, I think he would like to be able to handle bit small POST request bodies (1k or so) but also very large onces (hundreds ot Meg or more). The the case where the POST body is large, we would not want Warp to try and suck the whole thing into memory before passing it to the server. I would also find the above useful. I think all that is required is the addition of a requestBody field that works much like the responseBody file in the Warp.Response structure. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Erik de Castro Lopo wrote:
Hmm, I came to this conclusion some time ago and the actual code seems to be different from how I remember it. However, there is still a problem.
The problem is that the Wai.Request structure
http://hackage.haskell.org/packages/archive/wai/0.4.2/doc/html/src/Network-W...
has no requestBody field.
Hmm, thinking about this some more, I actually don't think this needs any changes to Wai.Response. I also think I can handle this completely in my proxy, including the streaming of large response bodies. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On Mon, Nov 28, 2011 at 2:36 PM, Kazu Yamamoto
But I'm talking about relaying for body of HTTP request. In other words, my concern is about uploading. In my company, uploading files whose size is some giga bytes is quite common. I don't know how to relay a uploading huge file with a fixed-size buffer with Warp.
Hi Kazu,
The most straightforward way I know of to do it is through a bounded
channel (http://hackage.haskell.org/package/bounded-tchan or
http://hackage.haskell.org/package/BoundedChan). You'd fork a thread
to enumerate the output end of the socket and feed it request body
chunks through the channel, then do the reverse to proxy the response.
G
--
Gregory Collins

Greg,
But I'm talking about relaying for body of HTTP request. In other words, my concern is about uploading. In my company, uploading files whose size is some giga bytes is quite common. I don't know how to relay a uploading huge file with a fixed-size buffer with Warp.
Hi Kazu,
The most straightforward way I know of to do it is through a bounded channel (http://hackage.haskell.org/package/bounded-tchan or http://hackage.haskell.org/package/BoundedChan). You'd fork a thread to enumerate the output end of the socket and feed it request body chunks through the channel, then do the reverse to proxy the response.
Thank you for your suggestion. I guess these packages use MVar in deep inside and they are slow. Anyway, I will try. --Kazu

On Tue, Nov 29, 2011 at 1:47 PM, Kazu Yamamoto
Thank you for your suggestion. I guess these packages use MVar in deep inside and they are slow. Anyway, I will try.
Hi Kazu,
In the alioth shootout, on a quad-core Intel Q6600, 64-bit GHC does
about 5.3M context switches per second for the thread-ring benchmark;
that's 186 nanoseconds per context switch. I think the amount of
overhead will be reasonable here.
http://shootout.alioth.debian.org/u64q/benchmark.php?test=threadring&lang=ghc&lang2=java
I'd be quite interested to see whether stm beats MVars here or not.
G
--
Gregory Collins
participants (5)
-
Erik de Castro Lopo
-
Gregory Collins
-
Kazu Yamamoto
-
Michael Snoyman
-
yi huang