
Both readTVar and writeTVar are worse than O(1); they have to look up the TVar in the transaction log to see if you have made local changes to it. Right now it looks like that operation is O(n) where n is the number of TVars accessed by a transaction, so your big transaction which is just accessing a ton of TVars is likely O(n^2).
From ghc/HEAD/rts/STM.c:
static TRecEntry *get_entry_for(StgTRecHeader *trec, StgTVar *tvar,
StgTRecHeader **in) {
TRecEntry *result = NULL;
TRACE("%p : get_entry_for TVar %p", trec, tvar);
ASSERT(trec != NO_TREC);
do {
FOR_EACH_ENTRY(trec, e, {
if (e -> tvar == tvar) {
result = e;
if (in != NULL) {
*in = trec;
}
BREAK_FOR_EACH;
}
});
trec = trec -> enclosing_trec;
} while (result == NULL && trec != NO_TREC);
return result;
}
STM performance is not really geared towards "big" transactions right
now; in large part because big transactions are likely to starve under
any real workload anyways. If you have a single-threaded startup bit
to populate your data followed by concurrent small mutations, you can
put the startup in IO using small transactions to populate the data.
-- ryan
On Sun, Dec 28, 2008 at 8:02 PM, Jim Snow
I decided to try to implement a graph algorithm using STM. Each node in the graph has a set of TVar-protected lists of the nodes it links to and the nodes that link to it. Also, there is a global TVar-protected Data.Map that contains all the nodes in the graph, indexed by name (which is polymorphic):
data Node k r = Node { fwdPos :: TVar [Node k r], -- forward links (nodes we like) fwdNeg :: TVar [Node k r], -- we allow "negative" links, too (nodes we don't like) revPos :: TVar [Node k r], -- backlinks (nodes that like us) revNeg :: TVar [Node k r], -- negative back links (nodes that don't like us) currRep :: r, -- extra user-defined data name :: k -- node's unique identifier } deriving Show
data Network k r = Network { node :: TVar (M.Map k (Node k r)), -- map of nodes by name trusted :: TVar [Node k r] -- a list of nodes we need to iterate over occasionally } deriving Show
I tried loading a datafile of about 20,000 nodes into the graph in one big transaction, and found that it takes about 4 minutes. This seemed rather slow, so I replaced all the TVars with IORefs (and substituted STM with IO in the type signatures), and the same operation with the new version took about 6 seconds!
This is all with one thread, so there should be no contention for the TVars. Is there something about STM that makes it scale worse than linearly wrt the number of mutations in a transaction?
Above performance numbers are for ghc-6.10.1. With ghc-6.8.3, the STM version takes more than 9 minutes.
According to profiling, one of my trouble spots is this function, which just adds an entry onto a TVar [a]:
stmcons :: k -> TVar [k] -> STM () stmcons x tv = do xs <- readTVar tv writeTVar tv (x:xs)
This seems like it ought to be pretty innocuous, unless the whole list is getting evaluated each time I cons a new entry, or if readTVar or writeTVar are much more expensive than they appear.
-jim _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe