
Hi, I've got a question about lazy IO in Haskell. The most well known function to do lazy IO is the `hGetContents', which lazily reads all the contents from a handle and returns this as a regular [Char]. The thing with hGetContents is that is puts the Handle in a semi-closed state, no one can use the handle anymore. This behaviour is understandable from the point of safety; it is not yet determined when the result of hGetContents will actually be computed, using the handle in the meantime is undesirable. The point is, I think I really have a situation in which I want to use the handle again `after' a call to hGetContents. I think I can best explain this using a code example. readHttpMessage :: IO (Headers, Data.ByteString.Lazy.ByteString) readHttpMessage = do myStream <- <accept http connection from client> request <- hGetContents myStream header <- parseHttpHeader request bs <- Data.ByteString.Lazy.hGetContents myStream return (header, body) The Data.ByteString.Lazy.hGetContents in the example above obviously fails because the handle is semi-closed. So, what I am trying to do here is apply a parser (on that consumes Char's) to the input stream until it has succeeded. After this I want to collect the remainings of the stream in a lazy ByteString, or maybe even something else. I tried to open the handler again using some internal handle hackery, but this failed (luckily). In the module GHC.IO there is a function `lazyRead' that more or less seems to do what I want. But I'll guess there is a good reason for not exporting it. Does anyone know a pattern in which I can do this easily? Thanks, -- Sebastiaan

(Sorry, Sebastiaan, I hit send accidentally)
On Sat, Jun 14, 2008 at 1:18 PM, Sebastiaan Visser
readHttpMessage :: IO (Headers, Data.ByteString.Lazy.ByteString) readHttpMessage = do myStream <- <accept http connection from client> request <- hGetContents myStream header <- parseHttpHeader request bs <- Data.ByteString.Lazy.hGetContents myStream return (header, body)
Why not readHttpMessage = do myStream <- <accept http connection from client> data <- Data.ByteString.Lazy.hGetContents myStream (header, rest) <- parseHttpHeader data return (header, rest) i.e. make parseHttpHeader return the rest of the string it didn't parse? In fact, may I ask why parseHttpHeader is not a pure function? HTH, -- Felipe.

On Jun 14, 2008, at 6:45 PM, Felipe Lessa wrote:
(Sorry, Sebastiaan, I hit send accidentally)
On Sat, Jun 14, 2008 at 1:18 PM, Sebastiaan Visser
wrote: readHttpMessage :: IO (Headers, Data.ByteString.Lazy.ByteString) readHttpMessage = do myStream <- <accept http connection from client> request <- hGetContents myStream header <- parseHttpHeader request bs <- Data.ByteString.Lazy.hGetContents myStream return (header, body)
Why not
readHttpMessage = do myStream <- <accept http connection from client> data <- Data.ByteString.Lazy.hGetContents myStream (header, rest) <- parseHttpHeader data return (header, rest)
i.e. make parseHttpHeader return the rest of the string it didn't parse?
Doesn't this imply that the parseHttpHeader must work on ByteStrings instead of regular Strings? Maybe this works for HTTP headers, but sometimes ByteStrings are not appropriate. Especially when you are not using the regular `System.IO.hGetContents' but the `System.IO.UTF8.hGetContents'.
In fact, may I ask why parseHttpHeader is not a pure function?
This is not a real-life example. It might as well be a pure function.
HTH,
-- Felipe.
-- Sebastiaan

On Sat, Jun 14, 2008 at 1:50 PM, Sebastiaan Visser
Doesn't this imply that the parseHttpHeader must work on ByteStrings instead of regular Strings?
I made the change because it's easier and faster to go from ByteString to String than the converse.
Maybe this works for HTTP headers, but sometimes ByteStrings are not appropriate. Especially when you are not using the regular `System.IO.hGetContents' but the `System.IO.UTF8.hGetContents'.
You may use the package encoding instead, take a look at [1]. It will decode a lazy ByteString into a String. In your HTTP parsing example, you could break the ByteString on "\r\n\r\n" and then decodeLazy only the first part, while returning the second without modifying it. [1] http://hackage.haskell.org/packages/archive/encoding/0.4.1/doc/html/Data-Enc... -- Felipe.

On Jun 14, 2008, at 7:16 PM, Felipe Lessa wrote:
On Sat, Jun 14, 2008 at 1:50 PM, Sebastiaan Visser
wrote: Doesn't this imply that the parseHttpHeader must work on ByteStrings instead of regular Strings?
I made the change because it's easier and faster to go from ByteString to String than the converse.
Maybe this works for HTTP headers, but sometimes ByteStrings are not appropriate. Especially when you are not using the regular `System.IO.hGetContents' but the `System.IO.UTF8.hGetContents'.
You may use the package encoding instead, take a look at [1]. It will decode a lazy ByteString into a String. In your HTTP parsing example, you could break the ByteString on "\r\n\r\n" and then decodeLazy only the first part, while returning the second without modifying it.
Sounds interesting. This could indeed solve (a part of) my problem. The thing is, when decoding the header - in my web server architecture - it is not yet clear what should be done be the body. Some `handlers' may want to have it as a String, UTF8 String, ByteString or may not even need it at all.
[1] http://hackage.haskell.org/packages/archive/encoding/0.4.1/doc/ html/Data-Encoding.html#v%3AdecodeLazy
Because HTTP headers are line based and the parser does not need any look ahead - the first "\r\n\r\n" is the header delimiter - it might be possible to use the hGetLine for the headers. After this I am still `free' to decide what to do for the body: System.IO.hGetContents, System.IO.UTF8.hGetContents, Data.ByteString.Lazy.hGetContents, hClose, etc...
-- Felipe.
Thanks, Sebastiaan.

Sebastiaan Visser wrote:
Hi,
I've got a question about lazy IO in Haskell. The most well known function to do lazy IO is the `hGetContents', which lazily reads all the contents from a handle and returns this as a regular [Char].
The thing with hGetContents is that is puts the Handle in a semi-closed state, no one can use the handle anymore. This behaviour is understandable from the point of safety; it is not yet determined when the result of hGetContents will actually be computed, using the handle in the meantime is undesirable.
The point is, I think I really have a situation in which I want to use the handle again `after' a call to hGetContents. I think I can best explain this using a code example.
readHttpMessage :: IO (Headers, Data.ByteString.Lazy.ByteString) readHttpMessage = do myStream <- <accept http connection from client> request <- hGetContents myStream header <- parseHttpHeader request bs <- Data.ByteString.Lazy.hGetContents myStream return (header, body)
that's impure because parseHttpHeader doesn't return anything telling you how much of the stream it's looked at. Maybe it looked ahead more than it needed to, thus deleting part of the body. I was going to suggest, if you can't change parseHttpHeader to use ByteStrings,
bs <- Data.ByteString.Lazy.hGetContents myStream header <- parseHttpHeader (Data.ByteString.Lazy.unpack bs)
but you still have to get parseHttpHeader (or perhaps if it has similar friends) to tell you how much of the string it consumed! I don't know what parsing functions you have available to work with, so I can't tell you whether it's possible. -Isaac

On Jun 14, 2008, at 6:49 PM, Isaac Dupree wrote:
Sebastiaan Visser wrote:
Hi, I've got a question about lazy IO in Haskell. The most well known function to do lazy IO is the `hGetContents', which lazily reads all the contents from a handle and returns this as a regular [Char]. The thing with hGetContents is that is puts the Handle in a semi- closed state, no one can use the handle anymore. This behaviour is understandable from the point of safety; it is not yet determined when the result of hGetContents will actually be computed, using the handle in the meantime is undesirable. The point is, I think I really have a situation in which I want to use the handle again `after' a call to hGetContents. I think I can best explain this using a code example. readHttpMessage :: IO (Headers, Data.ByteString.Lazy.ByteString) readHttpMessage = do myStream <- <accept http connection from client> request <- hGetContents myStream header <- parseHttpHeader request bs <- Data.ByteString.Lazy.hGetContents myStream return (header, body)
that's impure because parseHttpHeader doesn't return anything telling you how much of the stream it's looked at. Maybe it looked ahead more than it needed to, thus deleting part of the body. I was going to suggest, if you can't change parseHttpHeader to use ByteStrings,
bs <- Data.ByteString.Lazy.hGetContents myStream header <- parseHttpHeader (Data.ByteString.Lazy.unpack bs)
but you still have to get parseHttpHeader (or perhaps if it has similar friends) to tell you how much of the string it consumed! I don't know what parsing functions you have available to work with, so I can't tell you whether it's possible.
It is a regular Parsec parser and I am pretty sure it does not consume anything other than the header itself. Maybe I could rewrite my parser to work on Word8's instead of Char's, I don't think HTTP even allows Unicode characters within HTTP headers. Thanks. I think I'll try this. But I'm still curious about how to lazily parse messages with arbitrary size Unicode headers and plain (possibly) binary bodies.
-Isaac
-- Sebastiaan.

On 2008 Jun 14, at 12:59, Sebastiaan Visser wrote:
But I'm still curious about how to lazily parse messages with arbitrary size Unicode headers and plain (possibly) binary bodies.
Sounds like Data.Binary (see hackage) to me. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Sat, 2008-06-14 at 18:18 +0200, Sebastiaan Visser wrote:
Hi,
I've got a question about lazy IO in Haskell. The most well known function to do lazy IO is the `hGetContents', which lazily reads all the contents from a handle and returns this as a regular [Char].
The thing with hGetContents is that is puts the Handle in a semi-closed state, no one can use the handle anymore. This behaviour is understandable from the point of safety; it is not yet determined when the result of hGetContents will actually be computed, using the handle in the meantime is undesirable.
The point is, I think I really have a situation in which I want to use the handle again `after' a call to hGetContents. I think I can best explain this using a code example.
readHttpMessage :: IO (Headers, Data.ByteString.Lazy.ByteString) readHttpMessage = do myStream <- <accept http connection from client> request <- hGetContents myStream header <- parseHttpHeader request bs <- Data.ByteString.Lazy.hGetContents myStream return (header, body)
Can you get the contents as a bytestring, then unpack that to a String for the purpose of parsing a prefix of the input as the headers. Since unpacking to a String is lazy it means you should only end up using String for the headers and not for the payload. As others have pointed out it's not clear from the above how you separate the headers and the body. Returning the length of the headers might work and then drop that amount from the bytestring would give the body as a bytestring. Duncan

hGetContents reads the entire contents of the stream till the end (although
lazily). The return value of hGetContents is logically the entire contents
of the stream. That it has not read it completely is only a part of its
laziness, so the result does not depend upon when the caller stops consuming
the result. That is why the handle is semi-closed; logically the handle is
already at the end of the stream.
A complete parser to parse the header and the body has to be used on the
entire contents, or some function which knows how to find the header end and
stop there has to be used.
Regards,
Abhay
On Sat, Jun 14, 2008 at 9:48 PM, Sebastiaan Visser
Hi,
I've got a question about lazy IO in Haskell. The most well known function to do lazy IO is the `hGetContents', which lazily reads all the contents from a handle and returns this as a regular [Char].
The thing with hGetContents is that is puts the Handle in a semi-closed state, no one can use the handle anymore. This behaviour is understandable from the point of safety; it is not yet determined when the result of hGetContents will actually be computed, using the handle in the meantime is undesirable.
The point is, I think I really have a situation in which I want to use the handle again `after' a call to hGetContents. I think I can best explain this using a code example.
readHttpMessage :: IO (Headers, Data.ByteString.Lazy.ByteString) readHttpMessage = do myStream <- <accept http connection from client> request <- hGetContents myStream header <- parseHttpHeader request bs <- Data.ByteString.Lazy.hGetContents myStream return (header, body)
The Data.ByteString.Lazy.hGetContents in the example above obviously fails because the handle is semi-closed.
So, what I am trying to do here is apply a parser (on that consumes Char's) to the input stream until it has succeeded. After this I want to collect the remainings of the stream in a lazy ByteString, or maybe even something else.
I tried to open the handler again using some internal handle hackery, but this failed (luckily). In the module GHC.IO there is a function `lazyRead' that more or less seems to do what I want. But I'll guess there is a good reason for not exporting it.
Does anyone know a pattern in which I can do this easily?
Thanks,
-- Sebastiaan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Jun 16, 2008, at 2:58 PM, Jules Bean wrote:
Sebastiaan Visser wrote:
Does anyone know a pattern in which I can do this easily?
Don't use hGetContents on a socket. That's asking for trouble.
Can you please explain why? What is a more easier method to spool your HTTP post data to a file than: Bs.hGetContens sock >>= Bs.hPut fd ?
Use hGetContents either NEVER (easy option) or only on throwaway handles/files which won't be used again.
My sockets will be thrown away eventually. Isn't anything? I can imagine that this can be a problem when using Keep-Alive connections - or alike - but that is a whole different story.
Jules
Thanks, Sebas.

Sebastiaan Visser wrote:
On Jun 16, 2008, at 2:58 PM, Jules Bean wrote:
Sebastiaan Visser wrote:
Does anyone know a pattern in which I can do this easily?
Don't use hGetContents on a socket. That's asking for trouble.
Can you please explain why?
Because it's a broken abstraction. It's only correct if all you will ever do is read all the data into one String and don't care about it after that. In my experience this is almost never true of sockets : there is always protocol overhead, handshaking, and the "next request". It might be fine for unusually simple socket setups.
What is a more easier method to spool your HTTP post data to a file than:
Bs.hGetContens sock >>= Bs.hPut fd
?
Yes, that's fine.

Jules Bean
Sebastiaan Visser wrote:
On Jun 16, 2008, at 2:58 PM, Jules Bean wrote:
Sebastiaan Visser wrote:
Does anyone know a pattern in which I can do this easily?
Don't use hGetContents on a socket. That's asking for trouble.
Can you please explain why?
Because it's a broken abstraction.
It's only correct if all you will ever do is read all the data into one String and don't care about it after that.
In my experience this is almost never true of sockets : there is always protocol overhead, handshaking, and the "next request".
It works for HTTP, as all responses MUST be in the same order as the requests. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.
participants (8)
-
Abhay Parvate
-
Achim Schneider
-
Brandon S. Allbery KF8NH
-
Duncan Coutts
-
Felipe Lessa
-
Isaac Dupree
-
Jules Bean
-
Sebastiaan Visser