+1 for changing readMVar to be atomic, but we should probably mention the change in the doc for readMVar, something like the following (perhaps in fewer words):

    -- | ...
    --
    -- /Compatibility note:/ prior to base 4.7, 'readMVar' was a combination
    -- of 'takeMVar' and 'putMVar', resulting in two drawbacks:
    --
    --  * 'readMVar' was not atomic in the presence of multiple producers.
    --    Between taking the value and putting it back, another thread
    --    calling 'putMVar' could win, causing 'readMVar' to block after
    --    retrieving a value.
    --
    -- * 'readMVar' was not multiple wakeup, meaning each consumer had to
    --   wake up the next.