Long pauses / idle time in a Haskell TCP server

Hello, I'm seeing long pauses in a server based on the 'scalable-server' package from Hackage, version 0.2.2 [1]. It normally performs very well, about 100-150 micro-secs client latency over the loopback interface, but a fair number of requests are much slower, 38-40 milli-secs, making the mean latency and throughput quite bad. The server is a simple version of a memory cache, with operations to "put" a key-value mapping, and to "get" the value for a key, using a data map. I've tried with the strict versions of 'unordered-containers' and the 'containers' package for the data map implementation, with similar results. I've tried it with ghc 7.4.1 and with 7.6.1 with similar results. Runtime info I got shows: - GC report: GC time is much less than the total time of the slow responses, so does not explain them. - CPU profile: Idle time is very high, and would explain the slow responses. By idle time I mean the part of wall time which is not included in the profile report's "total time". For example, when running a test for 5 minutes, the profile report has total time = 0.05 secs. 'top' output is consistent - quite low CPU usage during such a test. Apart from this there is nothing notable in the profile, most time is spent in the network input and output processing (network-enumerator etc.). One observation is that the pauses seem correlated with overwriting entries in the data map. If I'm mostly adding new keys (as well as reading), the throughput is much better, and the idle time in the CPU profile is lower (though the time spent in GC increases a lot). I've removed all concurrency from the program (including my version of scalable-server) to eliminate that factor, but the problem persists. I tried it on two fairly different Linux machines, with similar results. The client I use to benchmark is a simple C program, so I don't suspect it of causing the pauses. Looking at the timestamps on the network traffic confirms that the pauses are at the server. Any idea what might be causing the idle time? The long pauses are consistently 38-40ms, maybe that points to some aspect of CPU scheduling, leaving the program idle for some time? I can put the code on github if it would help. Many thanks! Alex [1] http://hackage.haskell.org/package/scalable-server-0.2.2

On Mon, Oct 8, 2012 at 10:10 AM, Alex Iliev
Hello,
I'm seeing long pauses in a server based on the 'scalable-server' package from Hackage, version 0.2.2 [1]. It normally performs very well, about 100-150 micro-secs client latency over the loopback interface, but a fair number of requests are much slower, 38-40 milli-secs, making the mean latency and throughput quite bad.
You might get more information from running threadscope on your program.
G
--
Gregory Collins
participants (2)
-
Alex Iliev
-
Gregory Collins