Lazy simpleHTTP

Henning Thielemann

27 Nov 2008 27 Nov '08

10:30 p.m.

I have HTTP-3001.0.4 and I like to check the HTTP response headers before downloading the entire document. I hesitate to do a HEAD request first and then a separate GET request, because there is a (little, but not zero) chance, that the document changes between both requests. I hoped that simpleHTTP is lazy enough, so that I can check the rspHeaders field of the HTTP response and that only touching rspBody would download the document content. Unfortunately simpleHTTP downloads the entire document before I can access the HTTP header. To be honest, I would also like to fetch the content of the document lazily. Actually, if simpleHTTP would behave lazy, I suspect this requires using unsafeInterleaveIO internally, which is considered a hack. Is this the reason why simpleHTTP is not lazy? Or do I just need a lazy Stream type for this purpose?

Show replies by date

Henning Thielemann

28 Nov 28 Nov

11:26 p.m.

On Thu, 27 Nov 2008, Henning Thielemann wrote:

...

I have HTTP-3001.0.4 and I like to check the HTTP response headers before downloading the entire document.

I have tried to solve this on my own. I have written a wrapper to any stream type. It first reads the entire document lazily into a string and then ships its contents when readBlock or readLine is called. However, the program blocks on 'close' now and I don't know why. (See LazyStream.example) When looking at Network.HTTP.sendHTTP I wonder whether it is a good idea, that sendHTTP closes the stream automatically when it finishs (and also when an exception is raised). In general I think it is good, when the one who opens a stream is also responsible for closing it. For a lazy stream, the current implementation of sendHTTP would fail, since when sendHTTP quits, the data is still not read, and when some processes the HTTP response, data will be lazily read from a closed stream.

Marc Weber

11:51 p.m.

On Sat, Nov 29, 2008 at 12:26:18AM +0100, Henning Thielemann wrote:

...

For a lazy stream, the current implementation of sendHTTP would fail, since when sendHTTP quits, the data is still not read, and when some processes the HTTP response, data will be lazily read from a closed stream.

All I know is that there have been some problems with too many opened connections which haven't been closed in the past. Wether they've been a result of beeing to lazy I can no longer remember Marc

Paul Brown

11:55 p.m.

On Nov 28, 2008, at 3:51 PM, Marc Weber wrote:

...

All I know is that there have been some problems with too many opened connections which haven't been closed in the past. Wether they've been a result of beeing to lazy I can no longer remember

That was resolved - the issue was with closing connections in the TCP code. Paul Brown paulrbrown@gmail.com

Conal Elliott

29 Nov 29 Nov

12:58 a.m.

...

In general I think it is good, when the one who opens a stream is also responsible for closing it.

That policy thwarts modularity, doesn't it? Just as does the policy of whoever allocates memory is responsible for freeing it. Laziness, like garbage collection, is mainly about modularity (as expressed and demonstrated in "Why functional programming matters"). So I'd recommend an automated, GC-based solution. GHC has weak references and finalizers for this sort of thing. I think there's room for improvement of the current implementation of finalizer scheduling for timely resource deallocation. We've had to deal with this problem for managing graphics memory, in order to make functional graphics efficient. - Conal 2008/11/28 Henning Thielemann

...

On Thu, 27 Nov 2008, Henning Thielemann wrote:

I have HTTP-3001.0.4 and I like to check the HTTP response headers before

...
downloading the entire document.

I have tried to solve this on my own. I have written a wrapper to any stream type. It first reads the entire document lazily into a string and then ships its contents when readBlock or readLine is called. However, the program blocks on 'close' now and I don't know why. (See LazyStream.example)

When looking at Network.HTTP.sendHTTP I wonder whether it is a good idea, that sendHTTP closes the stream automatically when it finishs (and also when an exception is raised). In general I think it is good, when the one who opens a stream is also responsible for closing it. For a lazy stream, the current implementation of sendHTTP would fail, since when sendHTTP quits, the data is still not read, and when some processes the HTTP response, data will be lazily read from a closed stream.

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Henning Thielemann

3:45 p.m.

On Sat, 29 Nov 2008, Henning Thielemann wrote:

...

On Thu, 27 Nov 2008, Henning Thielemann wrote:

...
I have HTTP-3001.0.4 and I like to check the HTTP response headers before downloading the entire document.

I have tried to solve this on my own. I have written a wrapper to any stream type. It first reads the entire document lazily into a string and then ships its contents when readBlock or readLine is called. However, the program blocks on 'close' now and I don't know why. (See LazyStream.example)

Is anyone else interested in getting HTTP response bodies lazily in general and trying my code in particular? Maybe some of the experts have an idea, why 'close' blocks.

Henning Thielemann

30 Nov 30 Nov

10:07 p.m.

On Sat, 29 Nov 2008, Henning Thielemann wrote:

...

On Sat, 29 Nov 2008, Henning Thielemann wrote:

...
On Thu, 27 Nov 2008, Henning Thielemann wrote:

...
I have HTTP-3001.0.4 and I like to check the HTTP response headers before downloading the entire document.

I have tried to solve this on my own. I have written a wrapper to any stream type. It first reads the entire document lazily into a string and then ships its contents when readBlock or readLine is called. However, the program blocks on 'close' now and I don't know why. (See LazyStream.example)

Is anyone else interested in getting HTTP response bodies lazily in general and trying my code in particular? Maybe some of the experts have an idea, why 'close' blocks.

I have got an idea by myself. On the one hand 'close' checks with a pattern match whether the buffer is empty, on the other hand 'close' checks whether the connection is active or closed. Both of these check will certainly force the lazy read to be completed. But maybe the first check is even wrong, since it aborts the program, if you close a connection before reading it completely: Network.Stream Network.TCP> s <- openTCPPort "ftp.tu-chemnitz.de" 80 Network.Stream Network.TCP> writeBlock s "GET /pub/linux/opensuse/distribution/11.0/iso/cd/openSUSE-11.0-KDE4-LiveCD-i386.iso HTTP/1.1\n" Right () Network.Stream Network.TCP> writeBlock s "Host: ftp.tu-chemnitz.de\n" Right () Network.Stream Network.TCP> writeBlock s "\n" Right () Network.Stream Network.TCP> readLine s Right "HTTP/1.1 200 OK\r\n" Network.Stream Network.TCP> readLine s Right "Date: Sun, 30 Nov 2008 21:58:43 GMT\r\n" Network.Stream Network.TCP> close s *** Exception: Network/TCP.hs:(166,10)-(172,17): Non-exhaustive patterns in function closeConn This pattern is certainly the [] in closeConn (MkConn sk addr [] _) Is this a bug or a feature?

Henning Thielemann

1 Dec 1 Dec

8:40 a.m.

On Sun, 30 Nov 2008, Henning Thielemann wrote:

...

On Sat, 29 Nov 2008, Henning Thielemann wrote:

...
On Sat, 29 Nov 2008, Henning Thielemann wrote:

...
On Thu, 27 Nov 2008, Henning Thielemann wrote:

...
I have HTTP-3001.0.4 and I like to check the HTTP response headers before downloading the entire document.

I have tried to solve this on my own. I have written a wrapper to any stream type. It first reads the entire document lazily into a string and then ships its contents when readBlock or readLine is called. However, the program blocks on 'close' now and I don't know why. (See LazyStream.example)

Is anyone else interested in getting HTTP response bodies lazily in general and trying my code in particular? Maybe some of the experts have an idea, why 'close' blocks.

I have got an idea by myself. On the one hand 'close' checks with a pattern match whether the buffer is empty, on the other hand 'close' checks whether the connection is active or closed. Both of these check will certainly force the lazy read to be completed.

I'll continue my thread ... I think it is best to build the lazy stream on top of Network.Socket, not Network.TCP, since the first one does not buffer and the second one does. Since my lazy stream reads the entire source content into a single String lazily, I already have a buffer. In order to not duplicate code, I'd like to introduce a hidden module Network.TCP.Private with open and close methods, which can then be used by Network.TCP and Network.Stream.Lazy. Alternatively I could add a new public module for unbuffered TCP transfer. I think this would be even cleaner. However, I had to name the new module, say, Network.TCP.Unbuffered and the module names Network.TCP.Unbuffered+Network.TCP are somehow the wrong way round compared to Network.TCP+Network.TCP.Buffered. Maybe I should name them Network.TCP.Unbuffered+Network.TCP.Buffered and re-export the latter one in Network.TCP. Have such changes a chance to get included into HTTP package?

Paul Brown

2 Dec 2 Dec

10:25 p.m.

Hi, Henning --

...

I'll continue my thread ... I think it is best to build the lazy stream on top of Network.Socket, not Network.TCP, since the first one does not buffer and the second one does. Since my lazy stream reads the entire source content into a single String lazily, I already have a buffer. In order to not duplicate code, I'd like to introduce a hidden module Network.TCP.Private with open and close methods, which can then be used by Network.TCP and Network.Stream.Lazy. Alternatively I could add a new public module for unbuffered TCP transfer. I think this would be even cleaner. However, I had to name the new module, say, Network.TCP.Unbuffered and the module names Network.TCP.Unbuffered+Network.TCP are somehow the wrong way round compared to Network.TCP+Network.TCP.Buffered. Maybe I should name them Network.TCP.Unbuffered+Network.TCP.Buffered and re-export the latter one in Network.TCP. Have such changes a chance to get included into HTTP package?

I'm still a little lost as to why this is necessary and/or desirable. Some (if not all) of your use cases would be achievable via proper use of HTTP protocol features (100 Continue, Conditional GET, etc.), and it is a question of whether those are well supported by the current implementation. For the laziness in handling response bodies, I think you'd actually want to expose an iteration semantic implemented on top of HTTP chunked encoding and ranged requests, and that would live at a higher level in the stack than the TCP layer. Just $0.02 for you. Paul Brown paulrbrown@gmail.com

Henning Thielemann

14 Dec 14 Dec

9:23 p.m.

On Tue, 2 Dec 2008, Paul Brown wrote:

...

I'm still a little lost as to why this is necessary and/or desirable. Some (if not all) of your use cases would be achievable via proper use of HTTP protocol features (100 Continue, Conditional GET, etc.), and it is a question of whether those are well supported by the current implementation.

I do not know, how '100 Continue' would help, since this is for conditionally sending the body of the HTTP request, not conditionally receiving the body of the HTTP response, right? However, you are right, in principle it would help me to use HTTP features, like the Accept header. Using it, I could tell, that I only want certain types of data to be send. However, e.g. the original HWS ignores the Accept header and when I checked other web servers, they do so, as well. So it still seems to be the best, to get the header via a GET request and cancel the transfer, once I have seen enough of the header.

...

For the laziness in handling response bodies, I think you'd actually want to expose an iteration semantic implemented on top of HTTP chunked encoding and ranged requests, and that would live at a higher level in the stack than the TCP layer.

Yes, I also want lazy download of the body. Certainly an iteratee implementation could be cleaner, but probably considerably different from the current implementation. I'm afraid 'Range' is supported as bad as 'Accept' - and what about dynamically generated (maybe even infinite) content? So I think I should not bother the web server with the way I want to download its content, but just mimic a stupid browser. My current state can be found here: http://code.haskell.org/~thielema/http/ I do not want to fork and I hope the patches are accepted somewhen and nobody adds conflicting patches in the meantime, since this would mean I have to re-record all my changes in order to avoid the dreaded exponential time darcs conflict resolution. However, currently there are only changes behind the scenes and no new visible features. I'm now using monad transformers from my explicit-exception package, which simplified a lot of the manual exception handling before. I removed some 'reverse's, which are a laziness killer, but I'm still not at the end. Chunked transfer is particularly difficult with respect to laziness, although I do not know why. I have added a Monad which further abstracts the Stream concept. It allows us to do HTTP communication in absence of IO, which is perfect for unit testing and allows lazy processing without unsafeInterleaveIO - you only need the unsafeInterleaveIO hidden in hGetContents.

6046

Age (days ago)

6063

Last active (days ago)

List overview

Download

9 comments

4 participants

participants (4)

Conal Elliott
Henning Thielemann
Marc Weber
Paul Brown

Lazy simpleHTTP

tags

participants (4)