Hi everyone,

I'm attempting to create benchmarks that include the cost of finalizers that the code creates.  To do this, I'm running the test, running performGC, and then waiting for finalizers to finish.  However, that last part is a bit of a beast.  I'm trying to determine if I can make an improvement to GHC that will better support this.

Currently, to wait for finalizers to finish, I do the following:
  * Run with -N1
  * Ensure (i.e., hope) that no finalizers in the program ever block
  * Repeatedly check whether rts_unsafeGetMyCapability()->run_queue_hd == END_TSO_QUEUE.  If it does not, yield; if it does, terminate the program.

In my testing, this appears to work.  However, since the Capability_ struct is not available in the rts headers provided by GHC, I was only able to achieve this by pasting the relevant headers into my project.  I don't expect this kind of functionality to be very forwards compatible, but copy/paste is a level of hackery that I'd prefer to avoid.

I'm not very experienced with GHC or RTS development, but I'd like to contribute something that would help make this code more maintainable.  A couple of things come to mind:
I'm also very open to other suggestions.

Thanks for reading, and I look forward to hearing your input.


Regards,
Ryan