Quoth Jeremy Shaw <jeremy@n-heptane.com>,
...
> What happens is the PS3 has closed the connection, and if you attempt
> to send any more packets the PS3 will tell you it has closed the
> connection and the write() / sendfile() call will raise SIGPIPE.
...
> So far there is:
>
> - no way for anyone besides Bardur to reproduce the problem
> - no sound explanation for why the PS3 client causes the error,
> but nothing else does
I think in fact this invalidates your premise. If the PS3 really
closed its connection in the standard fashion, then it would be trivial
to reproduce this problem with any other peer. Evidently it doesn't,
at least in this particular case, and that's why people are talking
about TCP keep-alives, which address the defunct peer problem (within
two hours, normally.)
The PS3 does do something though. If we were doing a write *and* read select on the socket, the read select would wakeup. So, it is trying to notify us that something has happened, but we are not seeing it because we are only looking at the write select().
But I can not explain what the PS3 client is doing differently than the other clients such that it does not cause the threadWaitWrite to wakeup.
Additionally, it is not clear that setting SO_KEEPALIVE will actually fix anything. The documentation that I have read indicates that that may only cause the read select() to wakeup not the write select(). Well, that is no good, because that is supposedly what is happening with the PS3 client already.
Anyway, part of the annoyance here is that in this particular case we shouldn't need any timeouts to 'guess' that the client is 'probably dead'. The client seems to be telling us that it has disconnected, but we are not looking in the right place. And if we did try to write we would get a sigPIPE error.
It is not the case the the client is unresponsive -- it is quite responsive. The problem is that we are not looking in the right place for that response.
But, 'looking in the right place' is tricky. How do you tell hPut that it should wakeup from threadWaitWrite if the Handle happens to be backed by a socket, and threadWaitRead has data available? That does not even always indicate an error condition, it can be a perfectly valid situation.
Well, before I think about that, I want to know what the PS3 client is doing differently such that it is the only client that seems to exhibit this behavior at the moment. If we do not understand the real difference between what the PS3 and the C client are doing, then I don't think we can expect to arrive at an appropriate fix.
- jeremy