Coming up with a better API for Network.Socket.recv

Hi all, I find it quite inconvenient to use the `recv` function in Network.Socket as it throws an exception when reaching EOF and there's no way to check whether EOF has been reached before calling `recv`. This means that all calls to `recv` needs to be wrapped in an exception handler. I've been thinking about changing the version of `recv` that's included in the network-bytestring library [1] so it behaves differently from the one in the network library. Before I do so I thought I should see if we can reach a consensus on what a nicer definition of `recv` would look like. My current thinking is that it would mimic what C/Python/Java does and return a zero length ByteString when EOF is reached. I'm also interested in understanding the reasons behind the design of the `recv` function in the network library. More generally, I'm interested in discussing the pros and cons of the current Haskell I/O library design where the different read functions throw EOF exceptions and you have to call e.g. hIsEOF before reading from a Handle. 1. http://github.com/tibbe/network-bytestring Cheers, Johan

Johan Tibell
I'm also interested in understanding the reasons behind the design of the `recv` function in the network library.
POSIX semantics. And, frankly, I'm opposed to messing with them: If you want to have different behaviour, please do a (different|wrapper) library. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

Achim Schneider
Johan Tibell
wrote: I'm also interested in understanding the reasons behind the design of the `recv` function in the network library.
POSIX semantics. And, frankly, I'm opposed to messing with them: If you want to have different behaviour, please do a (different|wrapper) library.
Ouch. I revoke. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

On Thu, 2009-02-26 at 22:45 +0100, Johan Tibell wrote:
Hi all,
I find it quite inconvenient to use the `recv` function in Network.Socket as it throws an exception when reaching EOF and there's no way to check whether EOF has been reached before calling `recv`. This means that all calls to `recv` needs to be wrapped in an exception handler.
NB: tryJust (guard . isEOFError) $ recv ... with base-4 or tryJust (ioErrors >=> guard . isEOFError) $ recv ... with base-3, right?
I've been thinking about changing the version of `recv` that's included in the network-bytestring library [1] so it behaves differently from the one in the network library. Before I do so I thought I should see if we can reach a consensus on what a nicer definition of `recv` would look like. My current thinking is that it would mimic what C/Python/Java does and return a zero length ByteString when EOF is reached.
+1 In the interest of totality. Also, Prelude.getChar/System.IO.hGetChar should have return type IO (Maybe Char) in the interest of totality.
I'm also interested in understanding the reasons behind the design of the `recv` function in the network library. More generally, I'm interested in discussing the pros and cons of the current Haskell I/O library design where the different read functions throw EOF exceptions and you have to call e.g. hIsEOF before reading from a Handle.
jcc

On Thu, Feb 26, 2009 at 1:45 PM, Johan Tibell
I find it quite inconvenient to use the `recv` function in Network.Socket as it throws an exception when reaching EOF and there's no way to check whether EOF has been reached before calling `recv`.
I agree, the current behaviour is quite unfortunate. In fact, I'm pretty sure I added an entry point named recv_ to network-bytestring to work around precisely this problem.
I'm also interested in understanding the reasons behind the design of the `recv` function in the network library.
I think that it was modeled after the Handle API, which provides an isEOF function that you can call to see whether a Handle is done before you try reading it. This works well enough on a Handle because it's buffered, but sockets aren't buffered, and the symmetric isEOF function isn't available. I don't like the way the Handle API works, but I grudgingly accept it as not amenable to change. There's another problem with the network APIs: they mirror the BSD socket API too faithfully, and provide insufficient type safety. You can try to send on an unconnected socket, and to bind a socket that's already connected, both of which should be statically forbidden. The APIs for datagram-oriented networking ought to be distinct from the stream-oriented variety, I think, even if the underlying C-level calls end up being substantially the same. Really, the big thing that's missing here is enough application of elbow grease from someone who's got a good eye for design and doesn't mind all the slog involved. I think that if someone published a network-alt package (much like the one that was published a few years ago) and tooted their horn vigorously enough, we could put the existing network package out to pasture in fairly short order.

"Bryan O'Sullivan"
There's another problem with the network APIs: they mirror the BSD socket API too faithfully, and provide insufficient type safety. You can try to send on an unconnected socket, and to bind a socket that's already connected, both of which should be statically forbidden. The APIs for datagram-oriented networking ought to be distinct from the stream-oriented variety, I think, even if the underlying C-level calls end up being substantially the same.
Iteratees to the rescue? Ideally, we'd have a composable IO system that's uniform across different types of IO. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

On Fri, Feb 27, 2009 at 12:07 AM, Achim Schneider
"Bryan O'Sullivan"
wrote: There's another problem with the network APIs: they mirror the BSD socket API too faithfully, and provide insufficient type safety. You can try to send on an unconnected socket, and to bind a socket that's already connected, both of which should be statically forbidden. The APIs for datagram-oriented networking ought to be distinct from the stream-oriented variety, I think, even if the underlying C-level calls end up being substantially the same.
Iteratees to the rescue? Ideally, we'd have a composable IO system that's uniform across different types of IO.
I would very much like that. However, even after thinking about the problem for several problems I don't know of a generic definition that feels right. Hopefully someone smarter will come up with one. Cheers, Johan

On Fri, Feb 27, 2009 at 12:03 AM, Bryan O'Sullivan
On Thu, Feb 26, 2009 at 1:45 PM, Johan Tibell
wrote: I find it quite inconvenient to use the `recv` function in Network.Socket as it throws an exception when reaching EOF and there's no way to check whether EOF has been reached before calling `recv`.
I agree, the current behaviour is quite unfortunate. In fact, I'm pretty sure I added an entry point named recv_ to network-bytestring to work around precisely this problem.
Yes, that's indeed the reason we added it. My current thinking is that we'd drop `recv` from network-bytestring in favor of `recv_`. I've checked how many libraries on Hackage depend on network-bytestring and there are few enough that we could make such a change. It's a bit unfortunate that these libraries have an open dependency on network-bytestring (e.g. network-bytestring >= 0.1.1.2). I will contact the maintainers of those libraries before making a new release.
There's another problem with the network APIs: they mirror the BSD socket API too faithfully, and provide insufficient type safety. You can try to send on an unconnected socket, and to bind a socket that's already connected, both of which should be statically forbidden. The APIs for datagram-oriented networking ought to be distinct from the stream-oriented variety, I think, even if the underlying C-level calls end up being substantially the same.
Really, the big thing that's missing here is enough application of elbow grease from someone who's got a good eye for design and doesn't mind all the slog involved. I think that if someone published a network-alt package (much like the one that was published a few years ago) and tooted their horn vigorously enough, we could put the existing network package out to pasture in fairly short order.
I would be interested in trying to create a better API. However, I'm not sure what it would look like. The design space is pretty big. * How can we provide the static guarantees we want? Perhaps with some kind of lightweight monadic regions but if so which definition should we use i.e. can a region return a Socket to the parent region or not? This has implications on how easy the API is to understand. * Should we use enumerators or not? Can they be added as a convenience layer on top of type safe low-level layer? Cheers, Johan

On 2009 Feb 26, at 16:45, Johan Tibell wrote:
definition of `recv` would look like. My current thinking is that it would mimic what C/Python/Java does and return a zero length ByteString when EOF is reached.
Ew. Isn't this what Maybe is for? Anyway, the reason recv doesn't return 0 is that if you have a datagram socket, a zero-length recv is valid and doesn't mean EOF. (Not that many UDP-using programs know what to do with a 0-length packet.) So you need to indicate "EOF" (sender closed its end) in some different way. It *should* have been Maybe.... -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

"Brandon S. Allbery KF8NH"
On 2009 Feb 26, at 16:45, Johan Tibell wrote:
definition of `recv` would look like. My current thinking is that it would mimic what C/Python/Java does and return a zero length ByteString when EOF is reached.
Ew. Isn't this what Maybe is for?
Anyway, the reason recv doesn't return 0 is that if you have a datagram socket, a zero-length recv is valid and doesn't mean EOF.
My man page says a retval of 0 means that "the peer has performed an orderly shutdown", which, in the UDP case, means that it has send a complete datagram... no mention of EOF. A true EOF in the sense of "no more data will be received" would mean unbinding the socket. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

On 2009 Feb 26, at 23:41, Achim Schneider wrote:
"Brandon S. Allbery KF8NH"
wrote: On 2009 Feb 26, at 16:45, Johan Tibell wrote:
definition of `recv` would look like. My current thinking is that it would mimic what C/Python/Java does and return a zero length ByteString when EOF is reached.
Ew. Isn't this what Maybe is for?
Anyway, the reason recv doesn't return 0 is that if you have a datagram socket, a zero-length recv is valid and doesn't mean EOF.
My man page says a retval of 0 means that "the peer has performed an orderly shutdown", which, in the UDP case, means that it has send a complete datagram... no mention of EOF. A true EOF in the sense of "no more data will be received" would mean unbinding the socket.
Right. Just have to realize that a zero-length datagram packet is possible and even meaningful, so 0 isn't available as an EOF flag. Anyway, the POSIX spec indicates the EOF condition as return -1 with errno == ECONNRESET; this should not be taken as anything but the limited expressiveness of a C-based API. We should map this return to Nothing. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

2009/2/27 Brandon S. Allbery KF8NH
On 2009 Feb 26, at 23:41, Achim Schneider wrote:
"Brandon S. Allbery KF8NH"
wrote: On 2009 Feb 26, at 16:45, Johan Tibell wrote: Anyway, the reason recv doesn't return 0 is that if you have a datagram socket, a zero-length recv is valid and doesn't mean EOF.
My man page says a retval of 0 means that "the peer has performed an orderly shutdown", which, in the UDP case, means that it has send a complete datagram... no mention of EOF. A true EOF in the sense of "no more data will be received" would mean unbinding the socket.
Right. Just have to realize that a zero-length datagram packet is possible and even meaningful, so 0 isn't available as an EOF flag.
If this is the case then the Network.Socket module is broken when used for UDP as it throw an exception on receiving a valid message.
Anyway, the POSIX spec indicates the EOF condition as return -1 with errno == ECONNRESET; this should not be taken as anything but the limited expressiveness of a C-based API. We should map this return to Nothing.
I'm not sure I agree. I think using exceptions in this case is fine as loosing the connection is indeed an exceptional condition and the best thing a program can do in this case is probably to abort processing of the disconnected client. Cheers, Johan

Anyway, the POSIX spec indicates the EOF condition as return -1 with errno == ECONNRESET; this should not be taken as anything but the limited expressiveness of a C-based API. We should map this return to Nothing.
Johan> I'm not sure I agree. I think using exceptions in this case is fine as Johan> loosing the connection is indeed an exceptional condition and the best Johan> thing a program can do in this case is probably to abort processing of Johan> the disconnected client. I guess this depends upon how exceptional you want an exception to be. To my mind, a lost connection is a fairly normal condition (FAR to normal for my teleworking situation :-( ). That is, I would expect the exception to occur rather than not, during one run of a program. On that basis, I would suggest Nothing is better than an exception. But I guess it depends upon the program. For long running servers, my expectation above is surely true. But other "servers" might not share the same expectation. Perhaps it should be configurable? -- Colin Adams Preston Lancashire

On 2009 Feb 27, at 4:25, Colin Paul Adams wrote:
Anyway, the POSIX spec indicates the EOF condition as return -1 with errno == ECONNRESET; this should not be taken as anything but the limited expressiveness of a C-based API. We should map this return to Nothing.
Johan> I'm not sure I agree. I think using exceptions in this case is fine as Johan> loosing the connection is indeed an exceptional condition and the best Johan> thing a program can do in this case is probably to abort processing of Johan> the disconnected client.
I guess this depends upon how exceptional you want an exception to be.
To my mind, a lost connection is a fairly normal condition (FAR to normal for my teleworking situation :-( ). That is, I would expect the exception to occur rather than not, during one run of a program. On that basis, I would suggest Nothing is better than an exception.
Actually, thinking about this, ECONNRESET can be a normal end-of- connection for TCP, but for UDP should never happen (on recv(); on send() it is surely an exception). But at the same time, if you're using recv() with TCP you are probably not working with a higher level protocol that simply shuts down the connection when it's done: it's more of a stream-oriented behavior, and the stream-oriented read() handles it as such. That said I have heard of cases where recv() is used for stream protocols for efficiency reasons. I don't know if the efficiency argument relates to anything newer than a PDP11 or VAX 750, though.... -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
participants (6)
-
Achim Schneider
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Colin Paul Adams
-
Johan Tibell
-
Jonathan Cast