are forkIO threads event-driven?

Hi Cafe, In GHC, if a thread spawned by forkIO blocks on some network or disk IO, is the threading system smart enough not to wake the thread until an IO event occurs on its input/output? The Control.Concurrent documentation doesn't specify, and the previous discussions I could find on this topic are out-of-date. There is a years-old GHC ticket, too, recently revived[2]. Put another way, is it possible yet to use forkIO for making a server to handle tens of thousands of concurrent network connections? If not, what is the best current Haskell/GHC way? Thanks, Aran [1] http://www.monkey.org/~provos/libevent/ [2] http://hackage.haskell.org/trac/ghc/ticket/635

On Thu, 2010-04-29 at 18:26 -0400, Aran Donohue wrote:
Hi Cafe,
In GHC, if a thread spawned by forkIO blocks on some network or disk IO, is the threading system smart enough not to wake the thread until an IO event occurs on its input/output? The Control.Concurrent documentation doesn't specify, and the previous discussions I could find on this topic are out-of-date. There is a years-old GHC ticket, too, recently revived[2].
Put another way, is it possible yet to use forkIO for making a server to handle tens of thousands of concurrent network connections? If not, what is the best current Haskell/GHC way?
Thanks, Aran
IIRC yes - they explicitly waits for read (waitForRead or something like that) on GHC. Regards

Hello Aran, Friday, April 30, 2010, 2:26:20 AM, you wrote:
In GHC, if a thread spawned by forkIO blocks on some network or disk IO, is the threading system smart enough not to wake the thread
afaik, yes. it's controlled by special i/o thread that multiplexes all i/o done via stdlibs. but ghc i/o manager can't use epoll/kqueue so it's appropriate only for small (or medium?) servers read "Writing High-Performance Server Applications in Haskell, Case Study: A Haskell Web Server" http://www.haskell.org/~simonmar/papers/web-server.ps.gz -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Aran,
Friday, April 30, 2010, 2:26:20 AM, you wrote:
In GHC, if a thread spawned by forkIO blocks on some network or disk IO, is the threading system smart enough not to wake the thread
afaik, yes. it's controlled by special i/o thread that multiplexes all i/o done via stdlibs. but ghc i/o manager can't use epoll/kqueue so it's appropriate only for small (or medium?) servers
Look at the recent work on the event library and replacing the IO manager. http://www.serpentine.com/blog/2010/01/22/new-ghc-io-manager-first-benchmark... There's much more background on the new code here, http://www.serpentine.com/blog/2009/12/17/making-ghcs-io-manager-more-scalab... and some nice benchmarks http://blog.johantibell.com/2010/01/scalable-timeout-support-for-ghcs-io.htm...

Thanks for the excellent links, that's exactly what I wanted. It's
interesting that they've chosen not to base the new work on libevent.
As an aside, I really don't think that the case study should be given any
more linkjuice as a response to GHC/Haskell IO concurrency questions. While
it's a wonderful tutorial on the programming technique side, it's a decade
old and was written at a time when serving 4000 requests was a reasonable
benchmark. These days modern web servers are moving more and more toward
handling tens of thousands of concurrent held-open *connections*---a
different metric and a different scale.
Cheers,
Aran
On Fri, Apr 30, 2010 at 2:51 AM, Don Stewart
bulat.ziganshin:
Hello Aran,
Friday, April 30, 2010, 2:26:20 AM, you wrote:
In GHC, if a thread spawned by forkIO blocks on some network or disk IO, is the threading system smart enough not to wake the thread
afaik, yes. it's controlled by special i/o thread that multiplexes all i/o done via stdlibs. but ghc i/o manager can't use epoll/kqueue so it's appropriate only for small (or medium?) servers
Look at the recent work on the event library and replacing the IO manager.
http://www.serpentine.com/blog/2010/01/22/new-ghc-io-manager-first-benchmark...
There's much more background on the new code here,
http://www.serpentine.com/blog/2009/12/17/making-ghcs-io-manager-more-scalab...
and some nice benchmarks
http://blog.johantibell.com/2010/01/scalable-timeout-support-for-ghcs-io.htm...

Hi Aran,
On Fri, Apr 30, 2010 at 9:28 PM, Aran Donohue
Thanks for the excellent links, that's exactly what I wanted. It's interesting that they've chosen not to base the new work on libevent.
The reason was mostly performance concerns due to libev(ent) using callbacks to signal events. Callbacks from C into Haskell can be inefficient. From the FFI addendum: "Optionally, an import declaration can specify, after the calling convention, the safety level that should be used when invoking an external entity. A safe call is less efficient, but guarantees to leave the Haskell system in a state that allows callbacks from the external code." Another reason was that if the code is in Haskell we can more easily get people to hack on it and adapt it to our needs. As an aside, I really don't think that the case study should be given any
more linkjuice as a response to GHC/Haskell IO concurrency questions. While it's a wonderful tutorial on the programming technique side, it's a decade old and was written at a time when serving 4000 requests was a reasonable benchmark. These days modern web servers are moving more and more toward handling tens of thousands of concurrent held-open *connections*---a different metric and a different scale.
The event library, linked by Don, handles tens of thousands of idle connections without problems (see the idle connection generator [1] I created). Bryan wrote a simple HTTP server [2] that handles 20,000 requests per second on one core. 1. http://github.com/tibbe/event/blob/master/benchmarks/DeadConn.hs 2. http://github.com/tibbe/event/blob/master/benchmarks/StaticHttp.hs Cheers, Johan

Johan Tibell
Hi Aran,
On Fri, Apr 30, 2010 at 9:28 PM, Aran Donohue
wrote: Thanks for the excellent links, that's exactly what I wanted. It's interesting that they've chosen not to base the new work on libevent.
The reason was mostly performance concerns due to libev(ent) using callbacks to signal events. Callbacks from C into Haskell can be inefficient. From the FFI addendum:
Anecdotally, I can confirm this; we're using the FFI binding to libev in
a project and for typical workloads it's actually a little slower than
the plain-jane select()-based Haskell version. It scales better as you
add connections of course.
G
--
Gregory Collins

In GHC, if a thread spawned by forkIO blocks on some network or disk IO, is the threading system smart enough not to wake the thread
... "disk IO", you say? Most platforms support asynchronous I/O for what UNIX calls `slow' devices - pipe, tty, Berkeley socket. Select, poll, kqueue, O_NDELAY, your pick - all stuff that has been semi-standard for decades. I've corresponded with people who are convinced it works on disk files, but I don't know where that idea got started, it doesn't far as I know. Old timer OSes (like VMS, sorry VMS fans!) supported asynch on disks, but I guess by the time UNIX got to the point where there might have been money in standardized asynchronous disk I/O support, disks were fast enough that most out-of-the-box users didn't care. If GHC has been plugging into asych disk I/O features, it would be really interesting to know how far it goes. Enabled by default? Works on which platforms, and devices - e.g., NFS filesystems? I've never worked on anything where a single process needed to have access to all available computer resources all the time, but it seems like a pretty tough job to tackle - eventually you end up pretty near the `real time' lifestyle, where you can't afford to have page faults and so forth. In real applications, I'm sure it's usually good enough to dispatch around slow devices, and let things back up for the brief moments that it takes to open(), read() etc. a disk file. But if you really need everything, I think multiple processes (or threads) might be the only sane way to go. Donn Cave, donn@avvanta.com

That's very interesting. I only brought it up because I'm thinking about the
upcoming problems of real-time web application servers.
I'm sure many people have seen this blog post and Dons's replies:
http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...
http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...The
Haskell code codexon used isn't the best Haskell can do. But I think it's
the clearest, most obvious code---the most like what someone learning from
the ground up would try first. Ideally, it should run fast by default, and
it's too bad that you need to learn about bytestrings (and choose between
lazy vs. strict), the various utf8 encoding options, and a new event library
to make it perform. Since I'm basically a beginner to Haskell, if I were to
set out to test out a WebSocket server in Haskell, my first pass code would
probably look a lot like the codexon template. I certainly wouldn't want to
go multi-process nor explicitly manage cores within a single process. I
would want forkIO to just work.
So it's very nice to hear that it looks like GHC will be getting efficient
event-driven IO-blocked thread awakening. The naive code will work better.
In my head, disk IO fits in because if a naively-written thread decides to
read from the disk, we don't want 20,000 concurrent network-bound threads to
be affected by it, but upon reflection I think the concern isn't too
justified.
Aran
On Sun, May 2, 2010 at 1:55 AM, Donn Cave
In GHC, if a thread spawned by forkIO blocks on some network or disk IO, is the threading system smart enough not to wake the thread
... "disk IO", you say?
Most platforms support asynchronous I/O for what UNIX calls `slow' devices - pipe, tty, Berkeley socket. Select, poll, kqueue, O_NDELAY, your pick - all stuff that has been semi-standard for decades. I've corresponded with people who are convinced it works on disk files, but I don't know where that idea got started, it doesn't far as I know.
Old timer OSes (like VMS, sorry VMS fans!) supported asynch on disks, but I guess by the time UNIX got to the point where there might have been money in standardized asynchronous disk I/O support, disks were fast enough that most out-of-the-box users didn't care. If GHC has been plugging into asych disk I/O features, it would be really interesting to know how far it goes. Enabled by default? Works on which platforms, and devices - e.g., NFS filesystems?
I've never worked on anything where a single process needed to have access to all available computer resources all the time, but it seems like a pretty tough job to tackle - eventually you end up pretty near the `real time' lifestyle, where you can't afford to have page faults and so forth. In real applications, I'm sure it's usually good enough to dispatch around slow devices, and let things back up for the brief moments that it takes to open(), read() etc. a disk file. But if you really need everything, I think multiple processes (or threads) might be the only sane way to go.
Donn Cave, donn@avvanta.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

aran.donohue:
That's very interesting. I only brought it up because I'm thinking about the upcoming problems of real-time web application servers.
I'm sure many people have seen this blog post and Dons's replies: http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...
The Haskell code codexon used isn't the best Haskell can do. But I think it's the clearest, most obvious code---the most like what someone learning from the ground up would try first. Ideally, it should run fast by default, and it's too bad that you need to learn about bytestrings (and choose between lazy vs. strict), the various utf8 encoding options, and a new event library to make it perform. Since I'm basically a beginner to Haskell, if I were to set out to test out a WebSocket server in Haskell, my first pass code would probably look a lot like the codexon template. I certainly wouldn't want to go multi-process nor explicitly manage cores within a single process. I would want forkIO to just work.
Would you write the Python solution though, as a naive Python user? It's scripting epoll-- which is a pretty specialized use. Anyway, I encourage people to use the event lib, even before the forkIO support is merged in. It's a lot of fun, http://donsbot.wordpress.com/2010/01/17/playing-with-the-new-haskell-epoll-e... Maybe Johan and Bryan can give us an update on the state of play? What's the ETA to commiting into HEAD? -- Don

On Sun, May 2, 2010 at 8:45 PM, Aran Donohue
That's very interesting. I only brought it up because I'm thinking about the upcoming problems of real-time web application servers.
I'm sure many people have seen this blog post and Dons's replies:
http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...
http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...The Haskell code codexon used isn't the best Haskell can do. But I think it's the clearest, most obvious code---the most like what someone learning from the ground up would try first. Ideally, it should run fast by default, and it's too bad that you need to learn about bytestrings (and choose between lazy vs. strict), the various utf8 encoding options, and a new event library to make it perform.
The Haskell Network.Socket module uses Strings to represent binary data. This is wrong as String is an abstract data type representing a sequence of Unicode code points, not bytes. Arguably the Network.Socket module should have used [Word8] instead of String. However, String and [Word8] are both represented as linked lists which is not a very efficient representation for large blocks of binary data. bytestring is simply a more efficient encoding of [Word8] and should be use anywhere you want to represent binary data. It's too late to change Network.Socket to use ByteStrings instead of Strings as it would break too much code. I wrote network-bytestring so that you can use ByteStrings instead of Strings when doing socket I/O. The network-bytestring package will most likely be merged into the network package at some point. While you can use the event library explicitly this is not how we intended the majority of users to use it. The goal is to integrate it into GHC 6.14 and as replace the current I/O manager. That means that you will be able to write standard forkIO based code (like in the linked article) and expect around 20,000 requests/second on one core (depending on your hardware).
Since I'm basically a beginner to Haskell, if I were to set out to test out a WebSocket server in Haskell, my first pass code would probably look a lot like the codexon template. I certainly wouldn't want to go multi-process nor explicitly manage cores within a single process. I would want forkIO to just work.
If we reach our GHC 6.14 goal you will. Cheers, Johan

Re event library and merge into haskell base: has any thought gone into the "windows" version of the library. Last I looked it was very unix centric - the windows api is very different. I believe it will require major rework to abstract the commonalities and deal efficiently with the differences. I suspect any talk of a merge is premature.
On Sun, May 2, 2010 at 8:45 PM, Aran Donohue
mailto:aran.donohue@gmail.com> wrote: That's very interesting. I only brought it up because I'm thinking about the upcoming problems of real-time web application servers.
I'm sure many people have seen this blog post and Dons's replies: http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...
http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...The Haskell code codexon used isn't the best Haskell can do. But I think it's the clearest, most obvious code---the most like what someone learning from the ground up would try first. Ideally, it should run fast by default, and it's too bad that you need to learn about bytestrings (and choose between lazy vs. strict), the various utf8 encoding options, and a new event library to make it perform.
The Haskell Network.Socket module uses Strings to represent binary data. This is wrong as String is an abstract data type representing a sequence of Unicode code points, not bytes. Arguably the Network.Socket module should have used [Word8] instead of String. However, String and [Word8] are both represented as linked lists which is not a very efficient representation for large blocks of binary data. bytestring is simply a more efficient encoding of [Word8] and should be use anywhere you want to represent binary data.
It's too late to change Network.Socket to use ByteStrings instead of Strings as it would break too much code. I wrote network-bytestring so that you can use ByteStrings instead of Strings when doing socket I/O. The network-bytestring package will most likely be merged into the network package at some point.
While you can use the event library explicitly this is not how we intended the majority of users to use it. The goal is to integrate it into GHC 6.14 and as replace the current I/O manager. That means that you will be able to write standard forkIO based code (like in the linked article) and expect around 20,000 requests/second on one core (depending on your hardware).
Since I'm basically a beginner to Haskell, if I were to set out to test out a WebSocket server in Haskell, my first pass code would probably look a lot like the codexon template. I certainly wouldn't want to go multi-process nor explicitly manage cores within a single process. I would want forkIO to just work.
If we reach our GHC 6.14 goal you will.
Cheers, Johan
------------------------------------------------------------------------
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

The event library has a pluggable interface, with multiple backends, and is entirely portable as a result. You just swap in your 'select' mechanism: http://github.com/tibbe/event/blob/master/src/System/Event/EPoll.hsc http://github.com/tibbe/event/blob/master/src/System/Event/Poll.hsc http://github.com/tibbe/event/blob/master/src/System/Event/KQueue.hsc Now, if you can implement the Backend methods, http://github.com/tibbe/event/blob/master/src/System/Event/Internal.hs You'll be good to go -- and we already know GHC can do threads on Windows, so the same mechanism should work faily easily. jvlask:
Re event library and merge into haskell base: has any thought gone into the "windows" version of the library. Last I looked it was very unix centric - the windows api is very different. I believe it will require major rework to abstract the commonalities and deal efficiently with the differences.
I suspect any talk of a merge is premature.
On Sun, May 2, 2010 at 8:45 PM, Aran Donohue
mailto:aran.donohue@gmail.com> wrote: That's very interesting. I only brought it up because I'm thinking about the upcoming problems of real-time web application servers.
I'm sure many people have seen this blog post and Dons's replies: http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...
http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...The Haskell code codexon used isn't the best Haskell can do. But I think it's the clearest, most obvious code---the most like what someone learning from the ground up would try first. Ideally, it should run fast by default, and it's too bad that you need to learn about bytestrings (and choose between lazy vs. strict), the various utf8 encoding options, and a new event library to make it perform.
The Haskell Network.Socket module uses Strings to represent binary data. This is wrong as String is an abstract data type representing a sequence of Unicode code points, not bytes. Arguably the Network.Socket module should have used [Word8] instead of String. However, String and [Word8] are both represented as linked lists which is not a very efficient representation for large blocks of binary data. bytestring is simply a more efficient encoding of [Word8] and should be use anywhere you want to represent binary data.
It's too late to change Network.Socket to use ByteStrings instead of Strings as it would break too much code. I wrote network-bytestring so that you can use ByteStrings instead of Strings when doing socket I/O. The network-bytestring package will most likely be merged into the network package at some point.
While you can use the event library explicitly this is not how we intended the majority of users to use it. The goal is to integrate it into GHC 6.14 and as replace the current I/O manager. That means that you will be able to write standard forkIO based code (like in the linked article) and expect around 20,000 requests/second on one core (depending on your hardware).
Since I'm basically a beginner to Haskell, if I were to set out to test out a WebSocket server in Haskell, my first pass code would probably look a lot like the codexon template. I certainly wouldn't want to go multi-process nor explicitly manage cores within a single process. I would want forkIO to just work.
If we reach our GHC 6.14 goal you will.
Cheers, Johan
------------------------------------------------------------------------
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

As I said, it is very unix centric. The backend methods rely upon file descriptors which in the windows world are specific to the C rts. It is the backend that requires the abstraction from os specific structures/handling.
The event library has a pluggable interface, with multiple backends, and is entirely portable as a result. You just swap in your 'select' mechanism:
http://github.com/tibbe/event/blob/master/src/System/Event/EPoll.hsc
http://github.com/tibbe/event/blob/master/src/System/Event/Poll.hsc
http://github.com/tibbe/event/blob/master/src/System/Event/KQueue.hsc
Now, if you can implement the Backend methods,
http://github.com/tibbe/event/blob/master/src/System/Event/Internal.hs
You'll be good to go -- and we already know GHC can do threads on Windows, so the same mechanism should work faily easily.
jvlask:
Re event library and merge into haskell base: has any thought gone into the "windows" version of the library. Last I looked it was very unix centric - the windows api is very different. I believe it will require major rework to abstract the commonalities and deal efficiently with the differences.
I suspect any talk of a merge is premature.
On Sun, May 2, 2010 at 8:45 PM, Aran Donohue
mailto:aran.donohue@gmail.com> wrote: That's very interesting. I only brought it up because I'm thinking about the upcoming problems of real-time web application servers.
I'm sure many people have seen this blog post and Dons's replies: http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...
http://www.codexon.com/posts/debunking-the-erlang-and-haskell-hype-for-serve...The Haskell code codexon used isn't the best Haskell can do. But I think it's the clearest, most obvious code---the most like what someone learning from the ground up would try first. Ideally, it should run fast by default, and it's too bad that you need to learn about bytestrings (and choose between lazy vs. strict), the various utf8 encoding options, and a new event library to make it perform.
The Haskell Network.Socket module uses Strings to represent binary data. This is wrong as String is an abstract data type representing a sequence of Unicode code points, not bytes. Arguably the Network.Socket module should have used [Word8] instead of String. However, String and [Word8] are both represented as linked lists which is not a very efficient representation for large blocks of binary data. bytestring is simply a more efficient encoding of [Word8] and should be use anywhere you want to represent binary data.
It's too late to change Network.Socket to use ByteStrings instead of Strings as it would break too much code. I wrote network-bytestring so that you can use ByteStrings instead of Strings when doing socket I/O. The network-bytestring package will most likely be merged into the network package at some point.
While you can use the event library explicitly this is not how we intended the majority of users to use it. The goal is to integrate it into GHC 6.14 and as replace the current I/O manager. That means that you will be able to write standard forkIO based code (like in the linked article) and expect around 20,000 requests/second on one core (depending on your hardware).
Since I'm basically a beginner to Haskell, if I were to set out to test out a WebSocket server in Haskell, my first pass code would probably look a lot like the codexon template. I certainly wouldn't want to go multi-process nor explicitly manage cores within a single process. I would want forkIO to just work.
If we reach our GHC 6.14 goal you will.
Cheers, Johan
------------------------------------------------------------------------
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Mon, May 3, 2010 at 1:42 AM, John Lask
Re event library and merge into haskell base: has any thought gone into the "windows" version of the library. Last I looked it was very unix centric - the windows api is very different. I believe it will require major rework to abstract the commonalities and deal efficiently with the differences.
I suspect any talk of a merge is premature.
Windows is already treated specially in the RTS so we can improve the I/O manager for *nix users without affecting Windows users. We're not against adding Windows support to the event library but it's unlikely to happen unless someone volunteers to do it. Cheers, Johan
participants (8)
-
Aran Donohue
-
Bulat Ziganshin
-
Don Stewart
-
Donn Cave
-
Gregory Collins
-
Johan Tibell
-
John Lask
-
Maciej Piechotka