Post by jseighPost by Chris M. ThomassonPost by Chris M. ThomassonUsing RCU to get around the #StoreLoad membar in SMR loads is
asymmetric in a sense... ?
RCU grace periods, quiescent periods can all be used in an asymmetric
realm. We can artificially initiate them, or detect them. That is an
implementation detail, right'ish?
I used term RCU because RCU used context switches as quiesce points and
I was tracking them since context switches are effectively a memory
barrier.
This would be, in my terms, a "passive" way of detecting quiescent periods.
Post by jseighLinux membarrier() uses context switches (effected by IPI,
interprocessor interrupts) in its implementation.
Yup. This is how FlushProcessWriteBuffers() on Windows works as well. I
call this an "active" method because we are artificially executing a
remote memory barrier, so to speak.
Post by jseighThat being the case,
if you were using RSEQ (restartable sequences) and wanted to do
something like work stealing, i.e. moving something
from one processor's local storage to another processor's (the one doing
the stealing) storage, you could use a call to
membarrier() after the move to make sure any ongoing operations by the
former processor were complete before you access
the appropriated storage. I don't know how performant it would be, though.
If work-stealing is rare, it should not be that bad. Also, humm... Is
there a way to do the IPI to a "list" of processors? Aka, processors in
an affinity mask? So, the IPI does not have to be used for all of them?
Well, this comment from the MS side:
Remarks
The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It guarantees
the visibility of write operations performed on one processor to the
other processors.
Seems to do it...
Dmitry and I talked a lot about work stealing queues back on
comp.programming.threads. I had this one idea called work-requesting. He
wrote about it here:
https://www.1024cores.net/home/parallel-computing/concurrent-skip-list/work-stealing-vs-work-requesting
Need to try to find the old thread on c.p.t where I told him about it.
I remember doing some experiment where each thread had a hash table of
single producer multi consumer lifo's. A thread could push work into one
of its local lifo's. Another thread that had no local work to do could
flush the lifo of another thread that had some work. It's basically a
lock/wait-free stack using a single atomic swap to flush all of the
work. Iirc, it worked pretty good!