buildFdSets: file descriptor out of range

Hello, If this is not a right place to ask this question, please tell me another place to ask. I'm developing a mail server with GHC 6.10.3 on Linux. The server is running well at the beginning. But after several hours, it receives an error, "buildFdSets: file descriptor out of range". Please tell me what happened? And please suggest me how to fix this problem. Here is brief description of the server. - linked with ADNS. - complied with the -threaded option since ADNS requires it. - uses forkIO to produce threads. - does not use "deamonize" of System.Posix.Daemonize since it uses forkProcess. I execute my server as foreground process. - Because there are so many nasty SMTP clients, most SMTP connections are time out. Handles of the SMTP connections disappear, so I cannot use "hClose" to close the handles. - pushes the limit of file descriptors to 65536 with setResourceLimit. --Kazu

On Jul 14, 2009, at 21:48 , Kazu Yamamoto (山本和彦) wrote:
running well at the beginning. But after several hours, it receives an error, "buildFdSets: file descriptor out of range".
I believe the runtime uses select(), which has a hard limit (enforced by the kernel) that the maximum file descriptor id be 1023. (select() uses bitmasks and there is a limit on the size of a bitmask; see FD_SETSIZE.)
- pushes the limit of file descriptors to 65536 with setResourceLimit.
Reduce this to 1024, otherwise the runtime will eventually find itself dealing with file descriptors beyond the select() limit mentioned above. Someone with more knowledge of the Haskell runtime will have to advise as to possible ways around it if you really need more than 1024 file descriptors. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

I believe the runtime uses select(), which has a hard limit (enforced by the kernel) that the maximum file descriptor id be 1023. (select() uses bitmasks and there is a limit on the size of a bitmask; see FD_SETSIZE.)
I understand. Thank you.
Reduce this to 1024, otherwise the runtime will eventually find itself dealing with file descriptors beyond the select() limit mentioned above. Someone with more knowledge of the Haskell runtime will have to advise as to possible ways around it if you really need more than 1024 file descriptors.
I used to execute my server with the limit of 1024 since this is the default limit of my machine. At that time, I suffered from the following errors: rpf: user error (Cannot create OS thread.) rpf: accept: resource exhausted (Too many open files) So, I pushed the limit to 65536. I don't believe my server receives 1024 connections at once. So, I guess file descriptors leak. Does anyone know what happens when a TCP connection is reset by the peer and its handle disappears. Does the file descriptor bound to the handle leak? --Kazu

On Jul 15, 2009, at 00:42 , Kazu Yamamoto (山本和彦) wrote:
I don't believe my server receives 1024 connections at once. So, I guess file descriptors leak.
IIRC finalization of file handles is delayed, so there is effectively a leak in the runtime where unused file descriptors don't get freed until the Handle itself is garbage collected due to memory (i.e. the runtime makes no attempt to collect unreferenced file handles per se). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On 15/07/2009 05:16, Brandon S. Allbery KF8NH wrote:
On Jul 14, 2009, at 21:48 , Kazu Yamamoto (山本和彦) wrote:
running well at the beginning. But after several hours, it receives an error, "buildFdSets: file descriptor out of range".
I believe the runtime uses select(), which has a hard limit (enforced by the kernel) that the maximum file descriptor id be 1023. (select() uses bitmasks and there is a limit on the size of a bitmask; see FD_SETSIZE.)
Strictly speaking it's the IO library, not the runtime, that calls select() when you're using -threaded. But otherwise that's all correct.
- pushes the limit of file descriptors to 65536 with setResourceLimit.
Reduce this to 1024, otherwise the runtime will eventually find itself dealing with file descriptors beyond the select() limit mentioned above. Someone with more knowledge of the Haskell runtime will have to advise as to possible ways around it if you really need more than 1024 file descriptors.
There's no easy workaround. We could have the IO library switch to using blocking read() calls for the out-of-range FDs to avoid the error from the IO manager, but that is likely to lead to a different problem: too many OS threads. The right fix is to move to using epoll() instead. I understand it is being worked on, but I don't know the current status (Johan?). Cheers, Simon

2009/7/15 Simon Marlow
On 15/07/2009 05:16, Brandon S. Allbery KF8NH wrote:
Reduce this to 1024, otherwise the runtime will eventually find itself
dealing with file descriptors beyond the select() limit mentioned above. Someone with more knowledge of the Haskell runtime will have to advise as to possible ways around it if you really need more than 1024 file descriptors.
There's no easy workaround. We could have the IO library switch to using blocking read() calls for the out-of-range FDs to avoid the error from the IO manager, but that is likely to lead to a different problem: too many OS threads.
The right fix is to move to using epoll() instead. I understand it is being worked on, but I don't know the current status (Johan?).
I have a standalone (i.e. not integrated into the RTS yet) proof of concept working using kqueue. However, to be portable we still need to fall back to select on systems that don't support anything better. This implies that if you want to write portable code you still suffer from this limit. I've been unable to hack lately due to an injury but I'll be able to get back to it next week. - Johan

Hello,
I have a standalone (i.e. not integrated into the RTS yet) proof of concept working using kqueue. However, to be portable we still need to fall back to select on systems that don't support anything better. This implies that if you want to write portable code you still suffer from this limit.
If portability is important, how about falling back to poll(), not epoll()?
I've been unable to hack lately due to an injury but I'll be able to get back to it next week.
Take care. --Kazu

2009/7/16 Kazu Yamamoto
Hello,
I have a standalone (i.e. not integrated into the RTS yet) proof of concept working using kqueue. However, to be portable we still need to fall back to select on systems that don't support anything better. This implies that if you want to write portable code you still suffer from this limit.
If portability is important, how about falling back to poll(), not epoll()?
We could provide poll as a possible backend but I don't think it's available on Windows. -- Johan

On Jul 16, 2009, at 03:32 , Johan Tibell wrote:
I have a standalone (i.e. not integrated into the RTS yet) proof of concept working using kqueue. However, to be portable we still need to fall back to select on systems that don't support anything better. This implies
2009/7/16 Kazu Yamamoto
that if you want to write portable code you still suffer from this limit.
If portability is important, how about falling back to poll(), not epoll()?
We could provide poll as a possible backend but I don't think it's available on Windows.
http://plibc.sourceforge.net/ ? -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Hello,
Reduce this to 1024, otherwise the runtime will eventually find itself dealing with file descriptors beyond the select() limit mentioned above. Someone with more knowledge of the Haskell runtime will have to advise as to possible ways around it if you really need more than 1024 file descriptors.
There's no easy workaround. We could have the IO library switch to using blocking read() calls for the out-of-range FDs to avoid the error from the IO manager, but that is likely to lead to a different problem: too many OS threads.
Thank you for reply. I changed forkIO to forkOS. But I received the following error. ERRORCannot create OS thread. I want to decrease the stack size of threads to increase the number of threads to be created. How can I do this? I tried to decrease ResourceStackSize and tried the RTS -k option but both does not increase the number of threads... --Kazu

On 16/07/2009 06:53, Kazu Yamamoto (山本和彦) wrote:
Hello,
Reduce this to 1024, otherwise the runtime will eventually find itself dealing with file descriptors beyond the select() limit mentioned above. Someone with more knowledge of the Haskell runtime will have to advise as to possible ways around it if you really need more than 1024 file descriptors. There's no easy workaround. We could have the IO library switch to using blocking read() calls for the out-of-range FDs to avoid the error from the IO manager, but that is likely to lead to a different problem: too many OS threads.
Thank you for reply.
I changed forkIO to forkOS. But I received the following error.
ERRORCannot create OS thread.
I don't think forkOS is going to help you here. The IO library is still using select(), all you're doing is making more OS threads. We could do blocking read() for a bound thread. That would make some sense, but we don't do it at the moment, and it does add an extra test to the code.
I want to decrease the stack size of threads to increase the number of threads to be created. How can I do this?
That's an OS issue, I'm not sure. Cheers, Simon
participants (4)
-
Brandon S. Allbery KF8NH
-
Johan Tibell
-
Kazu Yamamoto
-
Simon Marlow