Re: [xmonad] Frequent xmonad crashes (SIGBUS)

26 Feb 2013

      On Tue, Feb 26, 2013 at 12:21:40PM -0500, Brandon Allbery wrote:
...
On Mon, Feb 25, 2013 at 5:10 PM, Zev Weiss  wrote:
...
For the record, in case anyone else happens to encounter this -- it
was pointed out to me by a helpful individual off-list that this is
actually a known problem when running binaries mmaped out of AFS,
where my xmonad binary happens to reside.  I've changed my xsession
script to run it out of a local filesystem instead and am no longer
seeing this behavior.
Can you give me any more information about this?  Simply running
executables out of AFS does not have any known issues; if it did, Carnegie
Mellon University (my previous employer) would have run headlong into it
long since, and it would have been fixed by now.
This is a problem I have been annoyed by for a few years now and I've had
limited success in tracking it down.  The problem doesn't affect all
binaries - seemingly just haskell binaries.  It also gets worse with
larger haskell binaries.

The problem seems to be related to the state of the AFS cache somehow.
Just after a reboot with a cold cache, I have to run ghc (some of my GHC
installs are on AFS) 5+ times in a row to get it to do anything besides
die with a SIGBUS.  The same goes for pandoc.  After the binary starts up
properly the first time, it seems to be in cache and doesn't act up until
it gets kicked out of cache.

Here is an old cafe thread where I tried to track this down - not many
other people reported the problem, but those who did seemed resigned to
it:

  https://groups.google.com/forum/?fromgroups=#!searchin/haskell-cafe/tristan$...

That post highlights a separate but seemingly related problem.  There GHC
fails when it hits some TH code and has to load a few libraries off of
disk during compilation.  I don't know exactly what the ghci linker does
there, but it is prepping that code for execution and explodes if the
libraries it is loading are not in cache.  In those cases, I have to keep
running 'cabal install' and ghc keeps making forward progress, loading a
few more successfully each time.  Eventually they are all in cache and it
works.

My guess is that the problem is some bad interaction between whatever the
GHC RTS does for file IO and AFS, but it is hard to figure out where to
start looking.  I have never gotten a useful backtrace in any of these
crashes.  Most applications don't have any problems, so I imagine it has
to be GHC somehow.  That said, I've seen some similar crashes in
non-Haskell code if a program is using shared libraries that live on AFS.
if some application eats all of your memory and caches start getting
evicted, sometimes those applications with AFS-based shared libraries
explode in a similar way.  

Any insight would definitely be appreciated, since this annoys me a few
times a day.

Re: [xmonad] Frequent xmonad crashes (SIGBUS)

Tristan Ravitch