"New features in C++26" By Daroc Alden

Post by Lynn McGuire
"New features in C++26" By Daroc Alden
https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we have
a good idea of what features could be adopted for C++26 — although
proposals can still be submitted until January 2025. Of particular
interest is the addition of support for hazard pointers and user-space
read-copy-update (RCU). Even though C++26 is not yet a standard, many of
the proposed features are already available to experiment with in GCC or
Clang."
Lynn

Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from. I think I know but I
haven't found any independent confirmation of it. I know these are
coming from the folly library but I got no answers from them.

There isn't a lot of attribution in open source so you can't tell where
something came from or if it was independently developed. I know "split
reference counting" is now folklore according to a cppcon talk.

Joe Seigh

Chris M. Thomasson

2024-08-05 20:41:11 UTC

Post by Lynn McGuire
"New features in C++26" By Daroc Alden
https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we
have a good idea of what features could be adopted for C++26 —
although proposals can still be submitted until January 2025. Of
particular interest is the addition of support for hazard pointers and
user-space read-copy-update (RCU). Even though C++26 is not yet a
standard, many of the proposed features are already available to
experiment with in GCC or Clang."
Lynn

Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from.

Afaict, from you, Joe. Back on comp.programming.threads.

Post by jseigh
I think I know but I
haven't found any independent confirmation of it. I know these are
coming from the folly library but I got no answers from them.
There isn't a lot of attribution in open source so you can't tell where
something came from or if it was independently developed. I know "split
reference counting" is now folklore according to a cppcon talk.

I know you have prior art for it.

Chris M. Thomasson

2024-08-05 22:56:32 UTC

Post by Lynn McGuire
"New features in C++26" By Daroc Alden
https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we
have a good idea of what features could be adopted for C++26 —
although proposals can still be submitted until January 2025. Of
particular interest is the addition of support for hazard pointers
and user-space read-copy-update (RCU). Even though C++26 is not yet a
standard, many of the proposed features are already available to
experiment with in GCC or Clang."
Lynn

Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from.

Afaict, from you, Joe. Back on comp.programming.threads.

asymmetric memory barriers was a "feature" of using RCU for getting rid
of the nasty #StoreLoad in the original SMR: aka your RCU+SMR... Now
there are passive and active types of asymmetric membars. A Passive one
is trying to detect when a quiescent period has occured. An active one
would be to artificially cause a quiescent period to occur... Make any
sense to you?

Post by jseigh
I think I know but I haven't found any independent confirmation of
it. I know these are coming from the folly library but I got no
answers from them.
There isn't a lot of attribution in open source so you can't tell
where something came from or if it was independently developed. I
know "split reference counting" is now folklore according to a cppcon
talk.

I know you have prior art for it.

Chris M. Thomasson

2024-08-06 02:38:32 UTC

Post by Lynn McGuire
"New features in C++26" By Daroc Alden
https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we
have a good idea of what features could be adopted for C++26 —
although proposals can still be submitted until January 2025. Of
particular interest is the addition of support for hazard pointers
and user-space read-copy-update (RCU). Even though C++26 is not yet a
standard, many of the proposed features are already available to
experiment with in GCC or Clang."
Lynn

Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from.

Afaict, from you, Joe. Back on comp.programming.threads.

Post by jseigh
I think I know but I haven't found any independent confirmation of
it. I know these are coming from the folly library but I got no
answers from them.
There isn't a lot of attribution in open source so you can't tell
where something came from or if it was independently developed. I
know "split reference counting" is now folklore according to a cppcon
talk.

I know you have prior art for it.

Using RCU to get around the #StoreLoad membar in SMR loads is asymmetric
in a sense... ?

Chris M. Thomasson

2024-08-06 02:40:18 UTC

Post by Lynn McGuire
"New features in C++26" By Daroc Alden
https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence;
now that it's been more than a year since the finalization of C++23,
we have a good idea of what features could be adopted for C++26 —
although proposals can still be submitted until January 2025. Of
particular interest is the addition of support for hazard pointers
and user-space read-copy-update (RCU). Even though C++26 is not yet
a standard, many of the proposed features are already available to
experiment with in GCC or Clang."
Lynn

Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from.

Afaict, from you, Joe. Back on comp.programming.threads.

Post by jseigh
I think I know but I haven't found any independent confirmation of
it. I know these are coming from the folly library but I got no
answers from them.
There isn't a lot of attribution in open source so you can't tell
where something came from or if it was independently developed. I
know "split reference counting" is now folklore according to a cppcon
talk.

I know you have prior art for it.

Using RCU to get around the #StoreLoad membar in SMR loads is asymmetric
in a sense... ?

RCU grace periods, quiescent periods can all be used in an asymmetric
realm. We can artificially initiate them, or detect them. That is an
implementation detail, right'ish?

jseigh

2024-08-06 18:06:16 UTC

Using RCU to get around the #StoreLoad membar in SMR loads is asymmetric in a sense... ?

RCU grace periods, quiescent periods can all be used in an asymmetric realm. We can artificially initiate them, or detect them. That is an implementation detail, right'ish?

I used term RCU because RCU used context switches as quiesce points and I was tracking them since context switches are effectively a memory barrier.

Linux membarrier() uses context switches (effected by IPI, interprocessor interrupts) in its implementation. That being the case,
if you were using RSEQ (restartable sequences) and wanted to do something like work stealing, i.e. moving something
from one processor's local storage to another processor's (the one doing the stealing) storage, you could use a call to
membarrier() after the move to make sure any ongoing operations by the former processor were complete before you access
the appropriated storage. I don't know how performant it would be, though.

Joe Seigh

Chris M. Thomasson

2024-08-06 20:13:56 UTC

Post by Chris M. Thomasson
Using RCU to get around the #StoreLoad membar in SMR loads is
asymmetric in a sense... ?

RCU grace periods, quiescent periods can all be used in an asymmetric
realm. We can artificially initiate them, or detect them. That is an
implementation detail, right'ish?

I used term RCU because RCU used context switches as quiesce points and
I was tracking them since context switches are effectively a memory
barrier.

This would be, in my terms, a "passive" way of detecting quiescent periods.

Post by jseigh
Linux membarrier() uses context switches (effected by IPI,
interprocessor interrupts) in its implementation.

Yup. This is how FlushProcessWriteBuffers() on Windows works as well. I
call this an "active" method because we are artificially executing a
remote memory barrier, so to speak.

Post by jseigh
That being the case,
if you were using RSEQ (restartable sequences) and wanted to do
something like work stealing, i.e. moving something
from one processor's local storage to another processor's (the one doing
the stealing) storage, you could use a call to
membarrier() after the move to make sure any ongoing operations by the
former processor were complete before you access
the appropriated storage. I don't know how performant it would be, though.

If work-stealing is rare, it should not be that bad. Also, humm... Is
there a way to do the IPI to a "list" of processors? Aka, processors in
an affinity mask? So, the IPI does not have to be used for all of them?

Well, this comment from the MS side:

Remarks
The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It guarantees
the visibility of write operations performed on one processor to the
other processors.

Seems to do it...

Dmitry and I talked a lot about work stealing queues back on
comp.programming.threads. I had this one idea called work-requesting. He
wrote about it here:

https://www.1024cores.net/home/parallel-computing/concurrent-skip-list/work-stealing-vs-work-requesting

Need to try to find the old thread on c.p.t where I told him about it.

I remember doing some experiment where each thread had a hash table of
single producer multi consumer lifo's. A thread could push work into one
of its local lifo's. Another thread that had no local work to do could
flush the lifo of another thread that had some work. It's basically a
lock/wait-free stack using a single atomic swap to flush all of the
work. Iirc, it worked pretty good!

jseigh

2024-08-06 18:29:13 UTC

There isn't a lot of attribution in open source so you can't tell where something came from or if it was independently developed. I know "split reference counting" is now folklore according to a cppcon talk.

I tried to find that "folklore" comment. I thought it was in the cppcon 2023 talk
'Lock-free Atomic Shared Pointers Without a Split Reference Count? It Can Be Done! by Daniel Anderson'
but I couldn't find it again (it's an hour long talk). I did notice the split reference counting
technique described was slightly different than what I called differential reference counting.
It looks to me like split reference counting as described has an ABA problem.

Joe Seigh

jseigh

2024-08-06 22:58:14 UTC

Or maybe not. I took a quick look at the folly code. You really can't easily read c++ code in that style
so I basically guessed a bit. It would be better described as cached local references, rather than split
reference counting. You increment and decrement the local ref count in the atomic shared pointer and just
update the global ref count if the atomic shared pointer is changed. When the local ref is dropped and
the atomic shared ref is different or the local ref count is zero, the global ref count is decremented instead.

It's all about which cache lines you are hitting and how often. If you have a lot of atomic shared pointers to
an object, then maybe "split" reference counting is better since you are splitting across more cache lines. If
you have just one atomic shared pointer then differential reference counting might be better. At any rate,
reference counting performs rather poorly compared to other methods of deferred reclamation.

Joe Seigh

Chris M. Thomasson

2024-08-06 23:41:00 UTC

Post by jseigh
There isn't a lot of attribution in open source so you can't tell
where something came from or if it was independently developed. I
know "split reference counting" is now folklore according to a cppcon
talk.

I tried to find that "folklore" comment. I thought it was in the cppcon 2023 talk
'Lock-free Atomic Shared Pointers Without a Split Reference Count? It
Can Be Done! by Daniel Anderson'
but I couldn't find it again (it's an hour long talk). I did notice
the split reference counting
technique described was slightly different than what I called
differential reference counting.
It looks to me like split reference counting as described has an ABA problem.

I should have some more time tonight to take a look. Wrt local
references, are they per-thread? I remember an old experiment that used
all local references. Iirc, even for strong reference counting. It used
hashing so collision was an interesting issue. This is when two
unrelated objects could use the same local counters when a thread hashed
them into its local reference table.

It's all about which cache lines you are hitting and how often. If you
have a lot of atomic shared pointers to
an object, then maybe "split" reference counting is better since you are
splitting across more cache lines. If
you have just one atomic shared pointer then differential reference
counting might be better. At any rate,
reference counting performs rather poorly compared to other methods of
deferred reclamation.

Agreed.

Bonita Montero

2024-08-05 15:42:54 UTC

Post by Lynn McGuire
"New features in C++26" By Daroc Alden
https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we
have a good idea of what features could be adopted for C++26 — although
proposals can still be submitted until January 2025. Of particular
interest is the addition of support for hazard pointers and user-space
read-copy-update (RCU). Even though C++26 is not yet a standard, many
of the proposed features are already available to experiment with in
GCC or Clang."
Lynn

There will be updates from CppCon for sure.

But for C++23 this is a very good introduction to all new features:

Chris M. Thomasson

2024-08-05 20:39:29 UTC

That would be really interesting to me! hazard pointers aka (SMR) and
RCU in the standard? Wow. That would be neat. Just make sure that the
SMR has a possibility to remove the nasty #StoreLoad membar is has to
use in its original form, so to speak. Fwiw, Joe Seigh created SMR+RCU
to get rid of it... :^)

Post by Lynn McGuire
Even though C++26 is not yet a standard, many of
the proposed features are already available to experiment with in GCC or
Clang."
Lynn

Bonita Montero

2024-08-06 11:27:10 UTC

Post by Chris M. Thomasson
That would be really interesting to me! hazard pointers aka (SMR) and
RCU in the standard? Wow. That would be neat. Just make sure that the
SMR has a possibility to remove the nasty #StoreLoad membar is has to
use in its original form, so to speak. Fwiw, Joe Seigh created SMR+RCU
to get rid of it... :^)

I don't like hazard pointers since you've to know the maximum number
of participating threads in advance to specify the number of hazard
pointers in the list. I'll stick with an atomic<shared_ptr<>> whose
lesser efficiency actually doesn't count.

Chris M. Thomasson

2024-08-06 19:52:49 UTC

No. You have to know the maximum number of hazard pointers an algorithm
has to use. Some use only one hazard pointer, some use more. It's not
about the number threads at all.

Bonita Montero

2024-08-06 21:00:31 UTC

No. You have to know the maximum number of hazard pointers an algorithm
has to use. Some use only one hazard pointer, some use more. It's not
about the number threads at all.

Each thread occupies one hazard pointer and more threads than pre
-allocated hazard pointers aren't possible.

Chris M. Thomasson

2024-08-06 23:34:01 UTC