Support for lock-free/wait-free programming?

Hello all,
Does GHC expose any primitives for things like atomic compare-and-swap?
I can't seem to find anything in the docs. I'm wondering if it's
possible, for example, to implement things like the wait-free concurrent
queue from [1] or a lock-free wait-free hash table like the Azul people
are doing [2].
In network servers, we often have a large number of clients fighting
over shared resources -- even simple stuff like "which threads are
scheduled to timeout?" and "which threads are currently active, so I can
kill them if I need to?". Shared mutable collections seem to be a fact
of life here.
Under load, pure data structures behind MVars don't perform well because
of lock contention, and in my experience atomicModifyIORef is marginally
better but still suffers from contention issues (probably related to
thunk blackholing).
Recently I got a ~35% speedup on a benchmark by switching one of these
tables from a Map behind an IORef to a hashmap using a striped-locking
scheme (see [3] if you're interested.) I think there's a need for
high-concurrency data structures like this, and I don't see much on
Hackage that addresses it. As much as we like to say that we have a
handle on concurrency issues, from what I can see java.util.concurrent.*
has us soundly thrashed in this department, especially as it relates to
exposing atomic wait-free CPU primitives like the ones in
java.util.concurrent.atomic.* [4]. Speaking as a partisan, this cannot
be allowed to stand, can it?
Cheers,
G.
------------------------------------------------------------------------------
[1] Maged M. Michael and Michael L. Scott, "Simple, Fast, and Practical
Non-Blocking and Blocking Concurrent Queue Algorithms":
http://www.cs.rochester.edu/u/michael/PODC96.html
[2] http://www.stanford.edu/class/ee380/Abstracts/070221_LockFreeHash.pdf
[3] http://github.com/snapframework/snap-server/blob/master/src/Data/HashMap/Con...
[4] http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/Ato...
--
Gregory Collins

On 17/08/2010 06:09, Gregory Collins wrote:
Does GHC expose any primitives for things like atomic compare-and-swap? I can't seem to find anything in the docs. I'm wondering if it's possible, for example, to implement things like the wait-free concurrent queue from [1] or a lock-free wait-free hash table like the Azul people are doing [2].
We could provide a compare-and-swap on IORefs, indeed I've been planning to try that so we could move the implementation of atomicModifyIORef into user-space, so to speak. Anecdotally it's hard to beat the performance of a good pure data structure in a mutable container. For example, the mutable list experiments we did a while back [1] were all at least a factor of 10 slower than IORef [a]. [1] http://www.haskell.org/~simonmar/bib/concurrent-data-08_abstract.html
In network servers, we often have a large number of clients fighting over shared resources -- even simple stuff like "which threads are scheduled to timeout?" and "which threads are currently active, so I can kill them if I need to?". Shared mutable collections seem to be a fact of life here.
Under load, pure data structures behind MVars don't perform well because of lock contention, and in my experience atomicModifyIORef is marginally better but still suffers from contention issues (probably related to thunk blackholing).
This is the specific case that we addressed in the GHC runtime recently, and you should find that 6.14 will perform a lot better than 6.12 with pure data structures in mutable containers, especially with atomicModifyIORef. In the meantime you'll probably get better performance using STM. Whether you should perform the update strictly inside the transaction or not is a matter for experimentation, it probably depends on the data structure.
Recently I got a ~35% speedup on a benchmark by switching one of these tables from a Map behind an IORef to a hashmap using a striped-locking scheme (see [3] if you're interested.) I think there's a need for high-concurrency data structures like this, and I don't see much on Hackage that addresses it. As much as we like to say that we have a handle on concurrency issues, from what I can see java.util.concurrent.* has us soundly thrashed in this department, especially as it relates to exposing atomic wait-free CPU primitives like the ones in java.util.concurrent.atomic.* [4]. Speaking as a partisan, this cannot be allowed to stand, can it?
Absolutely. Although the Java crowd have to grapple with a complicated memory model which makes building these data structures virtually impossible. At least in Haskell you can whip up a concurrent version of any pure data structure trivially, and have a lot of confidence that you got it right. Of course you could do that in Java, but they don't tend to build pure data structures by default, whereas we have them by the bucketload. Cheers, Simon
Cheers,
G.
------------------------------------------------------------------------------
[1] Maged M. Michael and Michael L. Scott, "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms": http://www.cs.rochester.edu/u/michael/PODC96.html
[2] http://www.stanford.edu/class/ee380/Abstracts/070221_LockFreeHash.pdf
[3] http://github.com/snapframework/snap-server/blob/master/src/Data/HashMap/Con...
[4] http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/Ato...

Simon Marlow
On 17/08/2010 06:09, Gregory Collins wrote:
We could provide a compare-and-swap on IORefs, indeed I've been planning to try that so we could move the implementation of atomicModifyIORef into user-space, so to speak.
For hash tables it would be nice to have a CAS indexing primitive on MutableArray#/MutableByteArray#/etc, also.
Under load, pure data structures behind MVars don't perform well because of lock contention, and in my experience atomicModifyIORef is marginally better but still suffers from contention issues (probably related to thunk blackholing).
This is the specific case that we addressed in the GHC runtime recently, and you should find that 6.14 will perform a lot better than 6.12 with pure data structures in mutable containers, especially with atomicModifyIORef.
In the meantime you'll probably get better performance using STM. Whether you should perform the update strictly inside the transaction or not is a matter for experimentation, it probably depends on the data structure.
In the specific benchmark in question STM was slightly slower than MVars
(but that's pretty impressive!).
G
--
Gregory Collins

On Tue, 17 Aug 2010 01:09:41 -0400, Gregory Collins wrote:
Hello all,
Does GHC expose any primitives for things like atomic compare-and-swap? I can't seem to find anything in the docs.
Hello Gregory, I have recently published experimental and low-level FFI bindings to GCC's atomic primitives including CAS as 'bits-extras' on Hackage [1]. Instances for Int and Word types are included. This is likely lower-level than what you are after, but might still be handy for experimentation. Kind regards Gabriel [1]: http://hackage.haskell.org/packages/archive/bits-extras/0.1.0/doc/ html/Data-Bits-Atomic.html

Gabriel Wicke
On Tue, 17 Aug 2010 01:09:41 -0400, Gregory Collins wrote:
Hello all,
Does GHC expose any primitives for things like atomic compare-and-swap? I can't seem to find anything in the docs.
Hello Gregory,
I have recently published experimental and low-level FFI bindings to GCC's atomic primitives including CAS as 'bits-extras' on Hackage [1]. Instances for Int and Word types are included.
This is likely lower-level than what you are after, but might still be handy for experimentation.
On OSX libgcc_s is only supplied as a dynamic library, which GHC doesn't
seem to be able to link with. Any ideas would be appreciated.
G
--
Gregory Collins

Gregory Collins
Gabriel Wicke
writes: On Tue, 17 Aug 2010 01:09:41 -0400, Gregory Collins wrote:
Hello all,
Does GHC expose any primitives for things like atomic compare-and-swap? I can't seem to find anything in the docs.
Hello Gregory,
I have recently published experimental and low-level FFI bindings to GCC's atomic primitives including CAS as 'bits-extras' on Hackage [1]. Instances for Int and Word types are included.
This is likely lower-level than what you are after, but might still be handy for experimentation.
On OSX libgcc_s is only supplied as a dynamic library, which GHC doesn't seem to be able to link with. Any ideas would be appreciated.
It works with this patch:
------------------------------------------------------------------------------
Remove cc-options field from .cabal; the flags given don't work on OSX
diff --git a/bits-extras.cabal b/bits-extras.cabal
--- a/bits-extras.cabal
+++ b/bits-extras.cabal
@@ -42,8 +42,6 @@
--CC-Options: -O3 -fomit-frame-pointer -march=native -Wall
-- Try link-time optimization (inlining) with gcc 4.5:
-- CC-Options: -fomit-frame-pointer -march=native -Wall -flto
- CC-Options: -fomit-frame-pointer -march=native -Wall
- Extra-Libraries: gcc_s
Include-Dirs: cbits
Install-Includes: bitops-gcc.h atomic-bitops-gcc.h
Extensions: ForeignFunctionInterface
------------------------------------------------------------------------------
G
--
Gregory Collins

On Sat, 28 Aug 2010 21:22:57 -0400, Gregory Collins wrote:
On OSX libgcc_s is only supplied as a dynamic library, which GHC doesn't seem to be able to link with. Any ideas would be appreciated.
It works with this patch: - CC-Options: -fomit-frame-pointer -march=native -Wall -
Hello Gregory, thanks for your patch. I have just uploaded bits-extras-0.1.1 [1] with tweaked CC-Options to only include -Wall on non-linux systems (plus minor documentation tweaks). Could you check if this compiles on OSX? I would prefer to keep -Wall enabled if possible. Gabriel [1] http://hackage.haskell.org/package/bits-extras-0.1.1

Gabriel Wicke
Hello Gregory,
thanks for your patch.
I have just uploaded bits-extras-0.1.1 [1] with tweaked CC-Options to only include -Wall on non-linux systems (plus minor documentation tweaks).
Could you check if this compiles on OSX? I would prefer to keep -Wall enabled if possible.
It doesn't work because you missed the other half of my patch, removing
the reference to libgcc_s -- programs link just fine without it. With it
in there I get:
Resolving dependencies...
Downloading bits-extras-0.1.1...
Configuring bits-extras-0.1.1...
cabal: Missing dependency on a foreign library:
* Missing C library: gcc_s
G
--
Gregory Collins

Gregory Collins
It doesn't work because you missed the other half of my patch, removing the reference to libgcc_s -- programs link just fine without it. With it in there I get:
...sigh... The "programs link fine without it" part is a partial
lie. Anything that the C linker links (i.e. executables) works fine
without an explicit -lgcc_s, but ghci and compilations using template
haskell fail with an "unknown symbol `___bswapdi2'", or equivalent.
Looking for a workaround now.
G
--
Gregory Collins

On Sun, 29 Aug 2010 13:08:48 -0400, Gregory Collins wrote:
Gregory Collins
writes: ...sigh... The "programs link fine without it" part is a partial lie. Anything that the C linker links (i.e. executables) works fine without an explicit -lgcc_s, but ghci and compilations using template haskell fail with an "unknown symbol `___bswapdi2'", or equivalent. Looking for a workaround now.
After some more off-list discussion with Gregory (thanks for your help!) I split out the atomic operations to the bits-atomic package [1] which no longer depends on libgcc_s. GCC produces native code for atomic operations, so libgcc_s is not needed for these operations. Additionally, a test suite covering the most important operations is now included. I am still looking for a solution to drop the libgcc dependency for Data.Bits.Extras (which will remain in bits-extras) as well, but this is harder as the decision whether to use fall-back software implementations depends on the CPU capabilities. Cheers, Gabriel [1] http://hackage.haskell.org/package/bits-atomic-0.1.0

Gabriel Wicke
On Sun, 29 Aug 2010 13:08:48 -0400, Gregory Collins wrote:
After some more off-list discussion with Gregory (thanks for your help!) I split out the atomic operations to the bits-atomic package [1] which no longer depends on libgcc_s. GCC produces native code for atomic operations, so libgcc_s is not needed for these operations.
Wonderful, works perfectly, thanks!
G
--
Gregory Collins
participants (4)
-
Gabriel Wicke
-
Gregory Collins
-
Serguey Zefirov
-
Simon Marlow