Re: [Haskell-cafe] STM unperformance

18 Feb 2016

      On 02/18/2016 03:25 PM, Wojtek Narczyński wrote:
...
Dear list,
I create four haskell threads, each performs disjoint STM transactions.
The more system threads I add, the slower the program runs. With four
system threads the performance is 14% of the nonparallel run.
What is going on?
$ ghc -v
Glasgow Haskell Compiler, Version 7.10.3, stage 2 booted by GHC version
7.8.4
$ ghc -O2 -threaded test-stm.hs
$ time ./test-stm +RTS -N1
real    1.843s
$ time ./test-stm +RTS -N2
real    7.469s
$ time ./test-stm +RTS -N3
real    9.640s
$ time ./test-stm +RTS -N4
real    13.144s
$ cat test-stm.hs
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
import Control.Concurrent.STM.Stats
import System.IO.Unsafe
-- Copied from GHC docs
children :: MVar [MVar ()]
children = unsafePerformIO (newMVar [])
waitForChildren :: IO ()
waitForChildren = do
   cs <- takeMVar children
   case cs of
      []   -> return ()
      m:ms -> do
         putMVar children ms
         takeMVar m
         waitForChildren
forkChild :: IO () -> IO ThreadId
forkChild io = do
    mvar <- newEmptyMVar
    childs <- takeMVar children
    putMVar children (mvar:childs)
    forkFinally io (\_ -> putMVar mvar ())
-- Test case
main = do
   forkChild $ newTVarIO 0 >>= incrManyTimes "thread1"
   forkChild $ newTVarIO 0 >>= incrManyTimes "thread2"
   forkChild $ newTVarIO 0 >>= incrManyTimes "thread3"
   forkChild $ newTVarIO 0 >>= incrManyTimes "thread4"
   waitForChildren
   dumpSTMStats -- Confirms no conflicts
incrManyTimes :: String -> TVar Int -> IO ()
incrManyTimes l = incrRec (1000000 :: Int) where
   incrRec n v | n == 0     = pure ()
               | otherwise = trackNamedSTM l (modifyTVar v (+1)) >>
incrRec (n-1) v
While your code does not contain any conflicts, the stm-stats library
seems to contain some. Replacing trackNamedSTM with atomically speeds
things up quite a bit, so most of the time is lost in the library. A
short look at the code of the library shows that it uses
atomicModifyIORef on some global IORef holding a map of the statistics.
And your stm transactions are really short, thus that IORef is under
high pressure. Now atomicModifyIORef works by reading the old value,
computing the new one and then doing an atomic swap, if the IORef still
contains the old value. So if the value changed, the new value has to be
read, computed, and so on... Now you have four threads doing tiny
transactions and then updating this IORef, interfering with each other.

If I make the transaction longer by doing 1000 increments in one
transaction and then doing only 1000 transactions, the code scales like
one would expect.

Re: [Haskell-cafe] STM unperformance

Jonas Scholl