
Hi, I'd like to know a bit about the STM implementation in GHC, specifically about how it tries to achieve fairness. I've been reading "Composable Memory Transactions" but it does not contain that much details on this specific matter. What I want to know boils down to this: what order are processes run which have been woken up from a call to retry? When programming with condition variables the standard behaviour is that the process which has waited the longest is the first one to get to run. But that doesn't seem to be the behaviour here. Consider the following program: \begin{code} module STMFair where import Control.Concurrent import Control.Concurrent.STM test n = do v <- newTVarIO 0 mapM_ (\n -> forkIO (process n v) >> threadDelay delay) [1..n] atomically (writeTVar v 1) threadDelay delay delay = 500000 process id var = do putStrLn ("Process " ++ show id ++ " started") atomically $ do v <- readTVar var if v == 0 then retry else return () putStrLn ("Process " ++ show id ++ " finished") \end{code} When I run 'test 2' I expect it to print: Process 1 started Process 2 started Process 1 finished Process 2 finished This would correspond to the oldest process being executed first. But that is not what happens instead I get this (ghci 6.8.2, Ubuntu Linux): Process 1 started Process 2 started Process 2 finished Process 1 finished This is certainly not the behaviour I would want. I discovered this behaviour when implementing the dining philosophers using STM and there one of the philosophers gets starved. Except, that he's not quite starved. When I run the simulation long enough he will eventually be able to eat but then for a long time there will be some other philosopher that is starved. I find this behaviour very mysterious and it would be nice to have some light shed on it. Apart from this mysterious behaviour it seems quite easy to improve the fairness of the implementation. From my examples above it seems that the wait queues for a transactional variable do contain the processes in the order they call retry (try running 'test n' for some large n). It just seems that they are given to the scheduler in the wrong order, so all that needs to be done is to reverse the list. Am I right? Thanks for reading, Josef