
On 15/06/2010 06:09, braver wrote:
In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing an assert failure in Storage.c. ghc 6.10 got stuck at 30 days for good, and when profiling crashed twice with a "strange closure" or a stack overflow. So allocation is a problem still.
I'd be happy to help you track this down, but I don't have a machine big enough. Do you have any runs that display a problem with a smaller heap (< 16GB)? If the program is apparently hung, try connecting to it with 'gdb --pid=<pid>' and doing 'info thread' and 'where'. That might give me enough clues to find out where the problem is. Is this with -threaded, BTW? With residency on that scale, I'd expect the parallel GC to help quite a lot. But obviously getting it to not crash/hang is the first priority :) Cheers, Simon