
Recently I've been benchmarking my concurrent Gibbs sampler[1] on a largish multicore machine. I've been using GHC HEAD due to various GC-related fixes that are present. One thing that I've noticed in looking over the event logs is that there are large durations (tens of milliseconds) when HECs appear to be constantly bombarded with wake-ups from other capabilities. In these periods, the eventlog will be filled with blocks of messages such as this snippet from one benchmark run[2] (long repeated blocks marked with ellipses, a few irrelevant messages have been omitted yet not marked, see eventlog for full log), 28.320958s HEC 7: running thread 293 ... 28.391802s HEC 6: running thread 209 28.392070s HEC 6: waking up thread 284 on cap 7 28.392302s HEC 6: waking up thread 284 on cap 7 28.392579s HEC 6: waking up thread 284 on cap 7 28.392808s HEC 6: waking up thread 284 on cap 7 . . . 28.405971s HEC 16: waking up thread 284 on cap 7 28.406026s HEC 15: waking up thread 284 on cap 7 28.406080s HEC 12: waking up thread 284 on cap 7 28.406091s HEC 23: waking up thread 284 on cap 7 28.406095s HEC 6: waking up thread 284 on cap 7 28.406095s HEC 23: waking up thread 284 on cap 7 28.406225s HEC 16: waking up thread 284 on cap 7 . . . 28.490449s HEC 7: stopping thread 293 (thread yielding) 28.490454s HEC 7: waking up thread 284 on cap 7 28.490454s HEC 7: waking up thread 284 on cap 7 28.490455s HEC 7: waking up thread 284 on cap 7 28.490456s HEC 7: waking up thread 284 on cap 7 28.490456s HEC 7: waking up thread 284 on cap 7 28.490457s HEC 7: waking up thread 284 on cap 7 28.490458s HEC 7: waking up thread 284 on cap 7 . . . 28.491841s HEC 7: waking up thread 284 on cap 7 28.491850s HEC 7: migrating thread 284 to cap 0 28.491968s HEC 7: running thread 293 . . . In other words, it appears that HEC 7 took a vacation of more than a millisecond as it tried (ultimately unsuccessfully) to wake up thread 284. These blocks are easy to spot in this event log and quite difficult to explain. In this case I was being silly and spawned a few too many capabilities (30 capabilities whereas the machine has 24 real cores, 2 threads per core due to hyperthreading; I was running 24 threads). That being said, even under more reasonable conditions[3] (where I was running 24 capabilities with 20 threads on the same 24-core machine) I still see problems waking up. For example, from 32.293455s to 32.327349s see numerous attempts by many HECs to wake-up thread 517 on cap 12. In this period, cap 12 briefly runs thread 537 (32.326781s) but while running HEC 17 attempts to wake up 517 again. All-in-all, something seems a bit strange. Any ideas what might be going on here? I've CC'd Edward Yang (who I understand has recently been doing a rework on the scheduler) and Simon Marlow. Thanks, - Ben [1] https://github.com/bgamari/bayes-stack [2] http://goldnerlab.physics.umass.edu/~bgamari/Benchmark-wakeup.eventlog [3] http://goldnerlab.physics.umass.edu/~bgamari/Benchmark-wakeup-smaller.eventl...