
Simon, I don't think CPU usage is the issue. An individual thread will take a fraction of a second to deserialize a large packet. The issue is that, as you pointed out, you can get alerts even with 50 threads. Those fractions of a second add up in a certain way that's detrimental to the performance of the app. The timeleak code uses Ptr Word8 to pickle which should be very efficient. I believe the delay comes from the way 'sequ' is compiled by GHC. I'll take the liberty of quoting Andrew Kennedy (your colleague from MS Research) who wrote the picklers: -- My original pickler implementation was for SML. It was used in the MLj compiler, and is still used in the SML.NET compiler, and has acceptable performance (few ms pickling/unpickling for typical intermediate language object files). I must admit that I've not used the Haskell variant in anger. Apart from the inherent slowdown associated with laziness, is there a particular reason for poor performance? -- 'sequ' by itself does not seem like a big deal but when used to model records it builds a large nested lambda-list and I don't think that list is being compiled efficiently. I would appreciate if you could look at that and issue a verdict now that Andrew cofirms using the picklers in a real-life environment and w/o major problems. Suppose I chose a different implementation of binary IO and disposed of pickler combinators. Suppose I gained a 2x speed-up by doing so. I would now be getting alerts with 100 threads instead of 50, no? That's still far from ideal. Joel On Jan 3, 2006, at 4:43 PM, Simon Marlow wrote:
The reason things are the way they are is that a large number of *running* threads is not a workload we've optimised for. In fact, Joel's program is the first one I've seen with a lot of running threads, apart from our testsuite. And I suspect that when Joel uses a better binary I/O implementation a lot of that CPU usage will disappear.