OS swapping and haskell data structures

If you create a Data.Map or Data.Set larger than fits in physical memory, will OS level swapping enable your app to behave reasonably or will things just die catastrophically as you hit a memory limit? -Alex-

On Tue, Jul 31, 2007 at 10:45:56PM -0700, Alex Jacobson wrote:
If you create a Data.Map or Data.Set larger than fits in physical memory, will OS level swapping enable your app to behave reasonably or will things just die catastrophically as you hit a memory limit?
Data.{Set,Map} uses balanced binary trees. So if you have a 1 billion element data set which is so large that no significant fraction of it fits into cache, you can expect to access a random element in ~9 seeks, which is less than a second... good enough? This is a lot worse than it could be because memory access is too transparent. When GHC triggers a page fault, the Right Thing to do is for Linux to somehow put *that haskell thread* to sleep; but instead it will put the entire capability (or your whole program on the non-threading rts) to sleep. Linux's filesystems (which use various kinds of trees internally) avoid this issue by using asynchronous IO requests, but that's not an option for VM since there's only so much room for callbacks in a MOV instruction! There is much fertile territory for OS design space exploration here! Stefan

Alex Jacobson wrote:
If you create a Data.Map or Data.Set larger than fits in physical memory, will OS level swapping enable your app to behave reasonably or will things just die catastrophically as you hit a memory limit?
Relying on the OS to page portions of your app in and out should always be the fallback of last resort. You are fairly guaranteed to get terrible performance because the VM subsystem can't anticipate your app's memory access patterns, and catastrophic death of either your app or other system processes is a strong possibility (Google for "OOM killer" if you want some horror stories). In many cases, you can't even rely on paging being possible.

On Wed, 2007-08-01 at 11:31 -0700, Bryan O'Sullivan wrote:
Alex Jacobson wrote:
If you create a Data.Map or Data.Set larger than fits in physical memory, will OS level swapping enable your app to behave reasonably or will things just die catastrophically as you hit a memory limit?
Relying on the OS to page portions of your app in and out should always be the fallback of last resort. You are fairly guaranteed to get terrible performance because the VM subsystem can't anticipate your app's memory access patterns, and catastrophic death of either your app or other system processes is a strong possibility (Google for "OOM killer" if you want some horror stories). In many cases, you can't even rely on paging being possible.
Furthermore, as I understand it, GC does not interact well with paging since the GC has to traverse the data structures on major GCs it'll force it all to be kept in memory. Duncan

Ok, so for low throughput applications, you actually need a disk strategy. Got it. Ok, is there a standard interface to BerkleyDB or some other disk based store? -Alex- Duncan Coutts wrote:
On Wed, 2007-08-01 at 11:31 -0700, Bryan O'Sullivan wrote:
Alex Jacobson wrote:
If you create a Data.Map or Data.Set larger than fits in physical memory, will OS level swapping enable your app to behave reasonably or will things just die catastrophically as you hit a memory limit? Relying on the OS to page portions of your app in and out should always be the fallback of last resort. You are fairly guaranteed to get terrible performance because the VM subsystem can't anticipate your app's memory access patterns, and catastrophic death of either your app or other system processes is a strong possibility (Google for "OOM killer" if you want some horror stories). In many cases, you can't even rely on paging being possible.
Furthermore, as I understand it, GC does not interact well with paging since the GC has to traverse the data structures on major GCs it'll force it all to be kept in memory.
Duncan

Alex Jacobson wrote:
Ok, so for low throughput applications, you actually need a disk strategy. Got it.
Ok, is there a standard interface to BerkleyDB or some other disk based store?
I would absolutely kvell if there were some way to use a disk-based store to back Haskell objects without going through the IO monad.

On Wed, 2007-08-01 at 12:32 -0700, Alex Jacobson wrote:
Ok, so for low throughput applications, you actually need a disk strategy. Got it.
Ok, is there a standard interface to BerkleyDB or some other disk based store?
Well on hackage there's anydbm and BerkeleyDB. The former is probably the older and more mature of the two. Duncan

On Tue, 2007-07-31 at 22:45 -0700, Alex Jacobson wrote:
If you create a Data.Map or Data.Set larger than fits in physical memory, will OS level swapping enable your app to behave reasonably or will things just die catastrophically as you hit a memory limit?
It will die (catastrophically or not is in the eye of the beholder, I guess) when you exhaust heap or swap limits (set with +RTS options), or the maximum available memory (physical+swap or ulimit). In my experience, performance quickly becomes unbearable (and also Linux tends to be less than responsive) when you exceed physical memory, so I generally always use +RTS -Mx, with x about 80% of physical RAM. In fact, I often use a C file to set these parameters directly, I think I ripped it (or the idea) from darcs. See e.g. http://malde.org/~ketil/rbr/src/hooks.c -k
participants (6)
-
Alex Jacobson
-
Bryan O'Sullivan
-
Duncan Coutts
-
Ketil Malde
-
Seth Gordon
-
Stefan O'Rear