Post by jseighPost by Chris M. ThomassonThe membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and some
others along those lines, so to speak. This is for the store and load
version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
prefix, XCHG aside for it has an implied LOCK prefix. For instance the
original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK
prefix.
Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO
mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.
This is a lot different than using stores and loads. The original SMR
and Peterson's algo needs that "store followed by a load to a different
location" action to hold true, aka, storeload...
Now, I don't think that a data-dependant load can act like a storeload.
I thought that they act sort of like an acquire, aka #LoadStore |
#LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
the DEC Alpha is a different story... ;^)
Post by jseighFor cmpxchg, it has full cst_seq. For other rmw atomics I don't
know. I have to ask on c.a. I think some data dependency and/or
control dependency might factor in.
Joe Seigh