Hi everyone,

I'm attempting to create benchmarks that include the cost of finalizers that the code creates. To do this, I'm running the test, running performGC, and then waiting for finalizers to finish. However, that last part is a bit of a beast. I'm trying to determine if I can make an improvement to GHC that will better support this.

Currently, to wait for finalizers to finish, I do the following:

* Run with -N1

* Ensure (i.e., hope) that no finalizers in the program ever block

* Repeatedly check whether rts_unsafeGetMyCapability()->run_queue_hd == END_TSO_QUEUE. If it does not, yield; if it does, terminate the program.

In my testing, this appears to work. However, since the Capability_ struct is not available in the rts headers provided by GHC, I was only able to achieve this by pasting the relevant headers into my project. I don't expect this kind of functionality to be very forwards compatible, but copy/paste is a level of hackery that I'd prefer to avoid.

I'm not very experienced with GHC or RTS development, but I'd like to contribute something that would help make this code more maintainable. A couple of things come to mind:

I could add a function that does the exact check I need (rts_unsafeGetMyCapability()->run_queue_hd == END_TSO_QUEUE) and expose it to userspace. This is probably the simplest approach, and would certainly be the easiest for me, but it also strikes me as a fairly inelegant addition to the RTS API.
I could expose the private header files, rts/*.h, as <rts/private/*.h> to userspace C files. This seems like a generally useful approach, but I would want to ensure that it did not create maintenance headaches for anyone.

I'm also very open to other suggestions.

Thanks for reading, and I look forward to hearing your input.

Regards,

Ryan