Gregory,
The options in the 7.8.3 user guide says in the -Msize option that by default the heap is unlimited. I have several applications, and they all have messages like:
7fddc7bcd700: cap 2: waking up thread 7 on cap 2
7fddc7bcd700: cap 2: thread 4 stopped (yielding)
7fddcaad6740: cap 2: running thread 7 (ThreadRunGHC)
7fddcaad6740: cap 2: thread 7 stopped (heap overflow)
7fddcaad6740: cap 2: requesting parallel GC
7fddc5ffe700: cap 0: starting GC
7fddc57fd700: cap 1: starting GC
7fdda77fe700: cap 3: starting GC
7fddcaad6740: cap 2: starting GC
I assumed that when the heap ran out of space, it caused a GC, or it enlarged the heap. The programs that have these messages run for very long periods of time, and when I heap profile them, they use about 500KM to 1MB over long periods of time, and are quite stable.
As a test, I ran the hang application with profiling to see if memory jumps up before or after the hang.
What I notice is the app moves along using about 800KB, then there is a spike to 2MB at the hang. So I believe you, but I am confused about the RTS behavior and how I can have all these overflow messages in a normal application and how to tell the difference between these routine messages vs a real heap problem.
So, I dug deeper into the log. A normal execution for sending a command looks like:
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: sendCommand: encoded
7f99e6ffd700: cap 0: sendCommand: size 4
7f99e6ffd700: cap 0: sendCommand: unpacked
7f99e6ffd700: cap 0: Sending command of size 4
7f99e6ffd700: cap 0: Sending command of size "\NUL\EOT"
7f99e6ffd700: cap 0: sendCommand: sent
7f99e6ffd700: cap 0: sendCommand: flushed
7f99e6ffd700: cap 0: thread 7 stopped (blocked on an MVar)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
The thread is run, overflows, GC, runs, then blocks on an MVAr.
For a the hang case:
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (yielding)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (yielding)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
...
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: thread 45 stopped (yielding)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (blocked on black hole owned by thread 7)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
run, overflow, GC, run, overflow, GC, run, yield, other stuff, run blocked on black hole
And that is the last activity for thread 7.
I found a few links about black holes and such, so I’ll go off and read those and try to learn what the RTS is doing and why it can hang on a black hole. But if you have some hits, let me know.
Mike