I'll add making that change to my inline primops task. (It does give nice reference asm for that task. )
So there's no problem in casMutVar#?
There's probably no good reason not to use the gcc intrinsics. Either they didn't exist when I wrote the inline asm, or they had been added to gcc since I last read the docs.
Cheers,
Simon
On 01/02/2014 19:35, Ryan Newton wrote:
Ok, my bad, sorry all.
This is NOT a problem that will crop up in 7.8. Rather, it's just a
problem with the duplicated bits of GHC RTS functionality that were
stuck into the atomic-primops library. It was a C preprocessor problem
that was causing the inline asm we were discussing in this thread to not
actually be called.
Still, I'd like to be reminded of the rational for all this conditional
inline asm rather than using the C compiler intrinsics! Anyone?
Best,
-Ryan
On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald
<carter.schonwald@gmail.com <mailto:carter.schonwald@gmail.com>> wrote:
I got the test suite running on my (2 core) machine mac book air,
with 7.8
i've run it several times, not seeing any failures
On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald
<carter.schonwald@gmail.com <mailto:carter.schonwald@gmail.com>> wrote:
hrmmmm I have a crazy idea
"Compare RAX with r/m64. If equal, ZF is set and r64 is loaded
into r/m64. Else, clear ZF
and load r/m64 into RAX." is what the docs say for the cmpxchng
instruction
so RAX is the old values, (EAX in the 32bit case). And it
looks like we dont' set that explicitly when calling the asm ..
CMPXCHG r/m64, r64
hrmmm
On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton@gmail.com
<mailto:rrnewton@gmail.com>> wrote:
Ok, here's another experiment, on this commit:
https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
Here, if I use GCC's __sync_val_compare_and_swap instead of
GHC's version of cas(), the problem also goes away. I think
these two implementations should behave identically, and
that they don't perhaps indicates that there is something
off about the inline asm, as Carter was suggesting:
*#if i386_HOST_ARCH || x86_64_HOST_ARCH*
* __asm__ __volatile__ (*
* "lock\ncmpxchg %3,%1"*
* :"=a"(o), "=m" (*(volatile unsigned int *)p) *
* :"0" (o), "r" (n));*
* return o;*
The x86 CAS instruction must put the "return value" in the
accumulator register, and indeed this constrains "o" to be
allocated to the accumulator register, while the new value
"n" can be in any register.
So if there's a problem, I don't know what it is. Except
I'm not sure what the ramifications of "o" being a function
parameter AND having an "=a" constraint on it are...
-Ryan
On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald
<carter.schonwald@gmail.com
<mailto:carter.schonwald@gmail.com>> wrote:
Hey Ryan, i've made the leap to using 7.8 on my machine,
so i'll first have to get some pull requests in on
atomic-primops before I can test it locally :), expect
those patches later today!
looks like gcc's inline ASM logic is pretty correct,
after testing it a bit locally, pardon my speculative
jumping the gun.
On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton
<rrnewton@gmail.com <mailto:rrnewton@gmail.com>> wrote:
Hi Carter & others,
Carter, yes, this is CAS on pointers and in my next
mail I'll try to come up with some hypotheses as to
why we may have (remaining) problems there.
But first, I have been assured that on x86 there is
no failure mode in which doing a comparison on the
value read by CAS should not correctly diagnose
success or failure (same as directly reading the
Zero Flag) [1].
And yet, there's this discrepancy, where the
modified casMutVar that I linked to does not have
the failure. As for reproducing the failure, either
of the two following tests will currently show problems:
* Two threads try to casIORef False->True, both
succeed
* 120 threads try to read, increment, CAS until
they succeed. The total is often not 120
because multiple threads think the successfully
incremented, say, 33->34.
Here's a specific recipe for the latter test on GHC
7.6.3 Mac or Linux:
*git clone
git@github.com:rrnewton/haskell-lockfree-queue.git
*
*cd haskell-lockfree-queue/AtomicPrimops/*
*git checkout 1a1e7e55f6706f9e5754
*
*cabal sandbox init
*
*cabal install -f-withTH -fforeign ./ ./testing
--enable-tests
*
*./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
-t n_threads
*
You may have to run the last line several times to
see the failure.
Best,
-Ryan
[1] I guess the __sync_bool_compare_and_swap
intrinsic which reads ZF is there just to avoid the
extra comparison.
[2] P.S. I'd like to try this on GHC head, but the
RHEL 6 machine I usually use to build it is
currently not validating (below error, commit
65d05d7334). After I debug this gmp problem I'll
confirm that the bug under discussion applies on the
7.8 branch.
./sync-all checkout ghc-7.8
sh validate
...
/usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o:
relocation R_X86_64_32 against `__gmpz_sub' can not
be used when making a shared object; recompile with
-fPIC
libraries/integer-gmp/gmp/objs/aors.o: could not
read symbols: Bad value
On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald
<carter.schonwald@gmail.com
<mailto:carter.schonwald@gmail.com>> wrote:
Ryan, is your benchmark using CAS on pointers,
or immediate words? trying to get atomic primops
to build on my 7.8 build on my mac
On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald
<carter.schonwald@gmail.com
<mailto:carter.schonwald@gmail.com>> wrote:
https://ghc.haskell.org/trac/ghc/ticket/8724#ticket
is the ticket
when i'm more awake i'll experiment some more
On Sat, Feb 1, 2014 at 2:33 AM, Carter
Schonwald <carter.schonwald@gmail.com
<mailto:carter.schonwald@gmail.com>> wrote:
i have a ticket for tracking this,
though i'm thinking my initial attempt
at a patch generates the same object
code as it did before.
@ryan, what CPU variant are you testing
this on? is this on a NUMA machine or
something?
On Sat, Feb 1, 2014 at 1:58 AM, Carter
Schonwald <carter.schonwald@gmail.com
<mailto:carter.schonwald@gmail.com>> wrote:
woops, i mean cmpxchgq
On Sat, Feb 1, 2014 at 1:36 AM,
Carter Schonwald
<carter.schonwald@gmail.com
<mailto:carter.schonwald@gmail.com>>
wrote:
ok, i can confirm that on my
64bit mac, both clang and gcc
use cmpxchgl rather than cmpxchg
i'll whip up a strawman patch on
head that can be cherrypicked /
tested out by ryan et al
On Sat, Feb 1, 2014 at 1:12 AM,
Carter Schonwald
<carter.schonwald@gmail.com
<mailto:carter.schonwald@gmail.com>>
wrote:
Hey Ryan,
looking at this closely
Why isn't CAS using
CMPXCHG8B on 64bit
architectures? Could that
be the culprit?
Could the issue be that
we've not had a good stress
test that would create
values that are equal on the
32bit range, but differ on
the 64bit range, and you're
hitting that?
Could you try seeing if
doing that change fixes
things up?
(I may be completely wrong,
but just throwing this out
as a naive "obvious" guess)
On Sat, Feb 1, 2014 at 12:58
AM, Ryan Newton
<rrnewton@gmail.com
<mailto:rrnewton@gmail.com>>
wrote:
Then again... I'm having
trouble seeing how the
spec on page 3-149 of
the Intel manual would
allow the behavior I'm
seeing:
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
Nevertheless, this is
exactly the behavior
we're seeing with the
current Haskell primops.
Two threads
simultaneously
performing the same
CAS(p,a,b) can both
think that they succeeded.
On Sat, Feb 1, 2014 at
12:33 AM, Ryan Newton
<rrnewton@gmail.com
<mailto:rrnewton@gmail.com>>
wrote:
I commented on the
commit here:
https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
The problem is that
our "cas" routine in
SMP.h is similar to
the C compiler
intrinsic
__sync_val_compare_and_swap,
in that it returns
the old value. But
it seems we cannot
use a comparison
against that old
value to determine
whether or not the
CAS succeeded. (I
believe the CAS may
fail due to
contention, but the
old value may happen
to look like our old
value.)
Unfortunately, this
didn't occur to me
until it started
causing bugs [1]
[2]. Fixing
casMutVar# fixes
these bugs.
However, the way
I'm currently fixing
CAS in the
"atomic-primops"
package is by using
__sync_bool_compare_and_swap:
https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
What is the best fix
for GHC itself?
Would it be ok for
GHC to include a C
compiler intrinsic
like
__sync_val_compare_and_swap?
Otherwise we need
another big ifdbef'd
function like "cas"
in SMP.h that has
the
architecture-specific inline
asm across all
architectures. I
can write the x86
one, but I'm not
eager to try the others.
Best,
-Ryan
[1]
https://github.com/iu-parfunc/lvars/issues/70
[2]
https://github.com/rrnewton/haskell-lockfree/issues/15
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
<mailto:ghc-devs@haskell.org>
http://www.haskell.org/mailman/listinfo/ghc-devs