
On Tue, Jul 31, 2007 at 10:45:56PM -0700, Alex Jacobson wrote:
If you create a Data.Map or Data.Set larger than fits in physical memory, will OS level swapping enable your app to behave reasonably or will things just die catastrophically as you hit a memory limit?
Data.{Set,Map} uses balanced binary trees. So if you have a 1 billion element data set which is so large that no significant fraction of it fits into cache, you can expect to access a random element in ~9 seeks, which is less than a second... good enough? This is a lot worse than it could be because memory access is too transparent. When GHC triggers a page fault, the Right Thing to do is for Linux to somehow put *that haskell thread* to sleep; but instead it will put the entire capability (or your whole program on the non-threading rts) to sleep. Linux's filesystems (which use various kinds of trees internally) avoid this issue by using asynchronous IO requests, but that's not an option for VM since there's only so much room for callbacks in a MOV instruction! There is much fertile territory for OS design space exploration here! Stefan