Hello,
Just writing to let people know
the resolution of this problem...
After much frustration and toil,
we realized there was a bug in GHC's handle abstraction over sockets.
We resolved our immediate problem
by having our code deal directly with the sockets, and we filed a bug report,
#2703, which has just been (partially fixed) by Simon Marlow.
thanks,
Jeff
Simon Marlow <simonmarhaskell@gmail.com> wrote
on 10/10/2008 09:23:31 AM:
> Jeff Polakow wrote:
>
> > Don Stewart <dons@galois.com> wrote on 10/09/2008 02:56:02
PM:
> >
> > > jeff.polakow:
> > > > We have a server that accepts messages
over a socket, spawning
> > threads to
> > > > process them. Processing these messages
may cause other, outgoing
> > > > connections, to be spawned. Under
sufficient load, the main
> > server loop
> > > > (i.e. the call to accept, followed
by a forkIO), becomes
> > nonresponsive.
> > > >
> > > > A smaller distilled testcase reveals
that when sufficient socket
> > activity
> > > > is occurring, an incoming connection
may not be responded to
> > until other
> > > > connections have been cleared out
of the way, despite the fact
> > that these
> > > > other connections are being handled
by separate threads. One
> > issue that
> > > > we've been trying to figure out
is where this behavior arises
> > from-- the
> > > > GHC rts, the Network library, the
underlying C libraries.
> > > >
> > > > Have other GHC users doing applications
with large amounts of
> > > socket usage
> > > > observed similar behavior and managed
to trace back where it
> > originates
> > > > from? Are there any particular architectural
solutions that
> > people have
> > > > found to work well for these situations?
> > >
> > > Hey Jeff,
> > >
> > > Can you say which GHC you used, and whether you used
the threaded
> > > runtime or non-threaded runtime?
> > >
> > Oops, forgot about that...
> >
> > We used both ghc-6.8.3 and ghc-6.10.rc1 and we used the threaded
> > runtime. We are running on a 64 bit linux machine using openSUSE
10.
>
> The scheduler doesn't have a concept of priorities, so the accepting
thread
> will get the same share of the CPU as the other threads. Another
issue is
> that the accepting thread has to be woken up by the IO manager thread
when
> a new connection is available, so we might have to wait for the IO
manager
> thread to run too. But I wouldn't expect to see overly long
delays. Maybe
> you could try network-alt which does its own IO multiplexing.
>
> If you have multiple cores, you might want to try fixing the thread
> affinity - e.g. put all the worker threads on one core, and the accepting
> thread on the other core. You can do this using GHC.Conc.forkOnIO,
with
> the +RTS -qm -qw options.
>
> Other than that, I'm not sure what to try right now. We're hoping
to get
> some better profiling for parallel/concurrent programs in the future,
but
> it's not ready yet.
>
> Cheers,
> Simon
---
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.