
On Wed, Sep 28, 2011 at 6:04 AM, Andreas Voellmy
Sure. I'm writing a server that serves a number of long-lived TCP connections.
How many are you looking at? (ROFLSCALEhttp://www.youtube.com/watch?v=majbJoD6fzo?) And how much activity? Do you need real-time responses?
It seems that GC is the biggest obstacle to doing this. The problem seems to be that the current GC stops all the processors before performing a GC.
Each OS thread gets its own bump-pointer nursery; minor collections of this nursery do not result in whole system pauses. This should be small enough to fit into a Core's cache (the default 512 kB is usually okay) so we can keep the entire nursery in cache while GC'ing it, keeping its cost close to that of stack. However, if you add an external pointer to large data or thunks in the nursery - e.g. by mutating a shared IORef - you can undermine the benefits of the nursery. It might be worth trying to do more work without mutations, and try to force evaluation of data before writing it to a variable. The idea is to keep the nursery busy so that the second-generation collectors don't need to be. Controlling memory is also important. Use iteratees to help make guarantees about memory consumption. Ideally, you can keep each TCP connection under some fixed live space cost - e.g. 2-4 MB. This keeps GCs small and cheap, and also allows the entire thread to fit into the CPU's larger caches, thus reducing scheduling and evaluation costs. Indeed, controlling memory is the most important thing you should do to reduce GC costs and improve performance. GC only touches live memory. Avoiding allocations is much less important than controlling amount of live memory. Regards, Dave