
On 6/21/06, Duncan Coutts
On linux, epoll scales very well with minimal overhead. Using multiple OS threads to do blocking IO would not scale in the case of lots of idle socket connections, you'd need one OS thread per socket.
On Linux, OS threads can also scale very well. I have done an experiment using pipes and NPTL where most connections are idle---the performance scales like a straight line when up to 32K file descriptors and 16K threads are used.
The IO is actually no longer done inside the RTS, it's done by a Haskell worker thread. So it should be easier now to use platform-specific select() replacements. It's already different between unix/win32.
So I'd suggest the best approach is to keep the existing multiplexing non-blocking IO system and start to take advantage of more scalable IO APIs on the platforms we really care about (either select/poll replacements or AIO).
It is easy to take advantage of epoll---it shouldn't be that hard to bake it in. The question is about flexiblity: do we want it to be edge-triggered or level-triggered? Even with epoll built-in, the disk performance cannot keep up with NPTL unless AIO is also built-in. But for AIO, it is more complicated. It bypasses the OS caching; the Linux AIO even requires the use of certain kinds of file systems. My idea is that not everybody needs high-performance, asynchronous or nonblocking I/O. For those who really need it, it is worth (or, necessary) writing their own event loops, and event-driven programming in Haskell is not that difficult using CPS monads.