Hello,

Just writing to let people know the resolution of this problem...

After much frustration and toil, we realized there was a bug in GHC's handle abstraction over sockets.

We resolved our immediate problem by having our code deal directly with the sockets, and we filed a bug report, #2703, which has just been (partially fixed) by Simon Marlow.

thanks,
Jeff

Simon Marlow <simonmarhaskell@gmail.com> wrote on 10/10/2008 09:23:31 AM: > Jeff Polakow wrote: > > > Don Stewart <dons@galois.com> wrote on 10/09/2008 02:56:02 PM: > > > > > jeff.polakow: > > > > We have a server that accepts messages over a socket, spawning > > threads to > > > > process them. Processing these messages may cause other, outgoing > > > > connections, to be spawned. Under sufficient load, the main > > server loop > > > > (i.e. the call to accept, followed by a forkIO), becomes > > nonresponsive. > > > > > > > > A smaller distilled testcase reveals that when sufficient socket > > activity > > > > is occurring, an incoming connection may not be responded to > > until other > > > > connections have been cleared out of the way, despite the fact > > that these > > > > other connections are being handled by separate threads. One > > issue that > > > > we've been trying to figure out is where this behavior arises > > from-- the > > > > GHC rts, the Network library, the underlying C libraries. > > > > > > > > Have other GHC users doing applications with large amounts of > > > socket usage > > > > observed similar behavior and managed to trace back where it > > originates > > > > from? Are there any particular architectural solutions that > > people have > > > > found to work well for these situations? > > > > > > Hey Jeff, > > > > > > Can you say which GHC you used, and whether you used the threaded > > > runtime or non-threaded runtime? > > > > > Oops, forgot about that... > > > > We used both ghc-6.8.3 and ghc-6.10.rc1 and we used the threaded > > runtime. We are running on a 64 bit linux machine using openSUSE 10. > > The scheduler doesn't have a concept of priorities, so the accepting thread > will get the same share of the CPU as the other threads. Another issue is > that the accepting thread has to be woken up by the IO manager thread when > a new connection is available, so we might have to wait for the IO manager > thread to run too. But I wouldn't expect to see overly long delays. Maybe > you could try network-alt which does its own IO multiplexing. > > If you have multiple cores, you might want to try fixing the thread > affinity - e.g. put all the worker threads on one core, and the accepting > thread on the other core. You can do this using GHC.Conc.forkOnIO, with > the +RTS -qm -qw options. > > Other than that, I'm not sure what to try right now. We're hoping to get > some better profiling for parallel/concurrent programs in the future, but > it's not ready yet. > > Cheers, > Simon
---

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.