
I'll be interested to know if the fix helps your application. The bug reported in #2703 results in the program just allocating memory endlessly until it dies, so it doesn't sound exactly like the symptoms you were originally describing. Cheers, Simon Jeff Polakow wrote:
Hello,
Just writing to let people know the resolution of this problem...
After much frustration and toil, we realized there was a bug in GHC's handle abstraction over sockets.
We resolved our immediate problem by having our code deal directly with the sockets, and we filed a bug report, #2703, which has just been (partially fixed) by Simon Marlow.
thanks, Jeff
Simon Marlow
wrote on 10/10/2008 09:23:31 AM: Jeff Polakow wrote:
Don Stewart
wrote on 10/09/2008 02:56:02 PM: We have a server that accepts messages over a socket, spawning
process them. Processing these messages may cause other, outgoing connections, to be spawned. Under sufficient load, the main server loop (i.e. the call to accept, followed by a forkIO), becomes nonresponsive.
A smaller distilled testcase reveals that when sufficient socket activity is occurring, an incoming connection may not be responded to until other connections have been cleared out of the way, despite the fact
other connections are being handled by separate threads. One issue that we've been trying to figure out is where this behavior arises from-- the GHC rts, the Network library, the underlying C libraries.
Have other GHC users doing applications with large amounts of socket usage observed similar behavior and managed to trace back where it originates from? Are there any particular architectural solutions that
jeff.polakow: threads to that these people have
found to work well for these situations?
Hey Jeff,
Can you say which GHC you used, and whether you used the threaded runtime or non-threaded runtime?
Oops, forgot about that...
We used both ghc-6.8.3 and ghc-6.10.rc1 and we used the threaded runtime. We are running on a 64 bit linux machine using openSUSE 10.
The scheduler doesn't have a concept of priorities, so the accepting thread will get the same share of the CPU as the other threads. Another issue is that the accepting thread has to be woken up by the IO manager thread when a new connection is available, so we might have to wait for the IO manager thread to run too. But I wouldn't expect to see overly long delays. Maybe you could try network-alt which does its own IO multiplexing.
If you have multiple cores, you might want to try fixing the thread affinity - e.g. put all the worker threads on one core, and the accepting thread on the other core. You can do this using GHC.Conc.forkOnIO, with the +RTS -qm -qw options.
Other than that, I'm not sure what to try right now. We're hoping to get some better profiling for parallel/concurrent programs in the future, but it's not ready yet.
Cheers, Simon
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.