labeling MVar and STM data structures

Hello, I'm seeking of the source of the bug that my server based on Warp gets not-responding. "listThreads" shows that IOManager is blocked on a MVar permanently: 2 IOManager on cap 0: ThreadBlocked BlockedOnForeignCall 3 IOManager on cap 1: ThreadBlocked BlockedOnMVar 4 TimerManager: ThreadBlocked BlockedOnForeignCall IOManager should have the BlockedOnForeignCall reason since it's calling epoll_wait(). It's quit hard to debug this bug since I don't understand which MVar blocks IOManager of 3. Users can register MVar operations to IOManager while IOManager itself is using MVars internally. To make this kind of debugging easier, I would like ask GHC developers to extend MVar to hold its label and provide "labelMVar" like "labelThread". The same mechanism should be provided for STM as well. "BlockReason" should contain the target label. What do you think? Background: https://github.com/haskell/network/issues/590 --Kazu

Hi Kazu, This sounds like something you could investigate with ghc-debug. Once the mvar blocks you can look at the stack using ghc-debug-brick which should be enlightening. Matt On Wed, 20 Nov 2024, 01:17 Kazu Yamamoto (山本和彦) via ghc-devs, < ghc-devs@haskell.org> wrote:
Hello,
I'm seeking of the source of the bug that my server based on Warp gets not-responding. "listThreads" shows that IOManager is blocked on a MVar permanently:
2 IOManager on cap 0: ThreadBlocked BlockedOnForeignCall 3 IOManager on cap 1: ThreadBlocked BlockedOnMVar 4 TimerManager: ThreadBlocked BlockedOnForeignCall
IOManager should have the BlockedOnForeignCall reason since it's calling epoll_wait().
It's quit hard to debug this bug since I don't understand which MVar blocks IOManager of 3. Users can register MVar operations to IOManager while IOManager itself is using MVars internally.
To make this kind of debugging easier, I would like ask GHC developers to extend MVar to hold its label and provide "labelMVar" like "labelThread". The same mechanism should be provided for STM as well. "BlockReason" should contain the target label.
What do you think?
Background: https://github.com/haskell/network/issues/590
--Kazu
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

"Kazu Yamamoto \(山本和彦\) via ghc-devs"
Hello,
I'm seeking of the source of the bug that my server based on Warp gets not-responding. "listThreads" shows that IOManager is blocked on a MVar permanently:
2 IOManager on cap 0: ThreadBlocked BlockedOnForeignCall 3 IOManager on cap 1: ThreadBlocked BlockedOnMVar 4 TimerManager: ThreadBlocked BlockedOnForeignCall
IOManager should have the BlockedOnForeignCall reason since it's calling epoll_wait().
It's quit hard to debug this bug since I don't understand which MVar blocks IOManager of 3. Users can register MVar operations to IOManager while IOManager itself is using MVars internally.
To make this kind of debugging easier, I would like ask GHC developers to extend MVar to hold its label and provide "labelMVar" like "labelThread". The same mechanism should be provided for STM as well. "BlockReason" should contain the target label.
What do you think?
This is generally in the direction I proposed in #21877; I agree that it would be quite useful. I don't think that we necessarily want to extend all MVars with labels as this would impose a fixed memory cost on all users. However, I think the approach that I describe in the ticket, introducing an out-of-band map for labelling heap objects, would be a reasonable way forward. As Matt suggests, ghc-debug is also a great tool for investigating issues like this. However, I still think there is value in having better support in the core libraries for understanding issues such as this. Thoughts? Cheers, - Ben

Hi Ben,
This is generally in the direction I proposed in #21877; I agree that it would be quite useful. I don't think that we necessarily want to extend all MVars with labels as this would impose a fixed memory cost on all users. However, I think the approach that I describe in the ticket, introducing an out-of-band map for labelling heap objects, would be a reasonable way forward.
I support this. Thanks. --Kazu
participants (3)
-
Ben Gamari
-
Kazu Yamamoto
-
Matthew Pickering