Hi all,
Thanks so much for everyone's responses! I finally found the problem, so I thought I'd follow up and share what happened...
It turned out that the problem was not in the STM implementation, but rather in bad programming on my part. For some reason, I had a thread (thread #1) performing a transaction that blocked until any one of several TQueues become non-empty. On success, the thread sent a value onto another TQueue monitored by some another thread (thread #2). Thread #2 would then process all the items in the queues monitored by the first thread.
This lead to the following problem: when one of the TQueues became non-empty, the first thread would just go through its loop repeatedly, filling the other queue with values, and thread #2 wouldn't get a chance to run for a long time. This quickly lead to huge amounts of memory being used and the program would get totally bogged down. I finally found the problem when I noticed that I could make the problem less severe with -C0 and more severe with large values for -C. Large values let the first thread repeat the loop for a longer time before the second thread is scheduled and removes values from the queues.
-Andi