
On Jan 29, 2007, at 11:11 , Yitzchak Gale wrote:
Neil Mitchell wrote:
I will be releasing this function as part of a library shortly
Alistair Bayley wrote:
no! The code was merely meant to illustrate how a really basic HTTP GET might work. It certainly doesn't deal with a lot of the additional cases, like redirects and resource moves, and non-standards-compliant HTTP servers... there are a large number of webserver implementations which do not respect the HTTP standards (1.0 or 1.1), and HTTP clients (like web browsers) have to go to some lengths in order to get sensible responses out of most of them... if your needs really are very simple, then fine. But be aware that "doing the right thing" with real-world HTTP responses can be a can-o'-worms.
Let's not complicate things too much at the HTTP level. Low-level HTTP is a simple protocol, not hard to implement. You send a request with headers and data, and get a like response. Possibly reuse the connection. That's it. HTTP is useful for many things besides just loading web pages from large public servers. We need a simple, easy to use module that just implements HTTP. I think we have that, or we are close.
Loading URLs on the web is an entirely different matter. There is a whole layer of logic that is needed to deal with the mess out there. It builds not just on HTTP, but on various other standard and non-standard protocols.
URL loading is a hard problem, but usable solutions are well-known and available. I would suggest that we not re-invent the wheel here. If we want a pure Haskell solution - and that would be nice - we should start with an existing code base that is widely used, stable, and not too messy. Then re-write it in Haskell. Otherwise, just keep spawning wget or cUrl, or use MissingPy.
But please don't confuse concerns by mixing URL-loading logic into the HTTP library.
They made that mistake in Perl in the early days of the web, before it was clear what was about to happen. There is no reason for us to repeat the mistake.
Status report for the HTTP package (http://haskell.org/http/): The Network.HTTP module is an implementation of HTTP itself. The Network.Browser module sits on top of that and does more high-level things, such as cookie handling. I maintain the current HTTP package [1], but I haven't really done much maintenance, and I have only gotten a few patches submitted. Much of the code hasn't even been touched since Warrick Gray disappeared around 2002. The reason for this state of affairs is that I hardly use the library myself, and few others have contributed to it. In fact, I just now went to have a look at the code and noticed that until now, the most important functions in Network.Browser did not show up in the Haddock documentation because of missing type signatures. This library needs a more dedicated maintainer and more contributors. Do we have any candidates in this thread? Here's a list of TODO items off the top of my head to get you started: - Add a layer (on top of Network.Browser?) for simple get and post requests, with an interface something like: get :: URI -> IO String post :: URI -> String -> IO String - Switch to use lazy ByteStrings - Better API for Network.Browser? - Move HTTP authentication stuff to a separate module? - Move cookie stuff to a separate module? Unify with the similar code in the cgi package (Network.CGI.HTTP.Cookie)? - Use MD5 and Base64 from Dominic's new nimbler crypto package (see http://www.haskell.org/haskellwiki/Crypto_Library_Proposal) - Use the non-deprecated Network.URI API. - Implement HTTPS support. /Björn