Discussion:
"New features in C++26" By Daroc Alden
(too old to reply)
Lynn McGuire
2024-07-23 03:11:16 UTC
Permalink
"New features in C++26" By Daroc Alden
https://lwn.net/Articles/979870/

"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we have
a good idea of what features could be adopted for C++26 — although
proposals can still be submitted until January 2025. Of particular
interest is the addition of support for hazard pointers and user-space
read-copy-update (RCU). Even though C++26 is not yet a standard, many of
the proposed features are already available to experiment with in GCC or
Clang."

Lynn
jseigh
2024-07-29 15:15:46 UTC
Permalink
Post by Lynn McGuire
"New features in C++26" By Daroc Alden
   https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we have
a good idea of what features could be adopted for C++26 — although
proposals can still be submitted until January 2025. Of particular
interest is the addition of support for hazard pointers and user-space
read-copy-update (RCU). Even though C++26 is not yet a standard, many of
the proposed features are already available to experiment with in GCC or
Clang."
Lynn
Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from. I think I know but I
haven't found any independent confirmation of it. I know these are
coming from the folly library but I got no answers from them.

There isn't a lot of attribution in open source so you can't tell where
something came from or if it was independently developed. I know "split
reference counting" is now folklore according to a cppcon talk.

Joe Seigh
Chris M. Thomasson
2024-08-05 20:41:11 UTC
Permalink
Post by jseigh
Post by Lynn McGuire
"New features in C++26" By Daroc Alden
    https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we
have a good idea of what features could be adopted for C++26 —
although proposals can still be submitted until January 2025. Of
particular interest is the addition of support for hazard pointers and
user-space read-copy-update (RCU). Even though C++26 is not yet a
standard, many of the proposed features are already available to
experiment with in GCC or Clang."
Lynn
Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from.
Afaict, from you, Joe. Back on comp.programming.threads.
Post by jseigh
  I think I know but I
haven't found any independent confirmation of it.  I know these are
coming from the folly library but I got no answers from them.
There isn't a lot of attribution in open source so you can't tell where
something came from or if it was independently developed.  I know "split
reference counting" is now folklore according to a cppcon talk.
I know you have prior art for it.
Chris M. Thomasson
2024-08-05 22:56:32 UTC
Permalink
Post by Chris M. Thomasson
Post by jseigh
Post by Lynn McGuire
"New features in C++26" By Daroc Alden
    https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we
have a good idea of what features could be adopted for C++26 —
although proposals can still be submitted until January 2025. Of
particular interest is the addition of support for hazard pointers
and user-space read-copy-update (RCU). Even though C++26 is not yet a
standard, many of the proposed features are already available to
experiment with in GCC or Clang."
Lynn
Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from.
Afaict, from you, Joe. Back on comp.programming.threads.
asymmetric memory barriers was a "feature" of using RCU for getting rid
of the nasty #StoreLoad in the original SMR: aka your RCU+SMR... Now
there are passive and active types of asymmetric membars. A Passive one
is trying to detect when a quiescent period has occured. An active one
would be to artificially cause a quiescent period to occur... Make any
sense to you?
Post by Chris M. Thomasson
Post by jseigh
  I think I know but I haven't found any independent confirmation of
it.  I know these are coming from the folly library but I got no
answers from them.
There isn't a lot of attribution in open source so you can't tell
where something came from or if it was independently developed.  I
know "split reference counting" is now folklore according to a cppcon
talk.
I know you have prior art for it.
Chris M. Thomasson
2024-08-06 02:38:32 UTC
Permalink
Post by Chris M. Thomasson
Post by jseigh
Post by Lynn McGuire
"New features in C++26" By Daroc Alden
    https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we
have a good idea of what features could be adopted for C++26 —
although proposals can still be submitted until January 2025. Of
particular interest is the addition of support for hazard pointers
and user-space read-copy-update (RCU). Even though C++26 is not yet a
standard, many of the proposed features are already available to
experiment with in GCC or Clang."
Lynn
Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from.
Afaict, from you, Joe. Back on comp.programming.threads.
Post by jseigh
  I think I know but I haven't found any independent confirmation of
it.  I know these are coming from the folly library but I got no
answers from them.
There isn't a lot of attribution in open source so you can't tell
where something came from or if it was independently developed.  I
know "split reference counting" is now folklore according to a cppcon
talk.
I know you have prior art for it.
Using RCU to get around the #StoreLoad membar in SMR loads is asymmetric
in a sense... ?
Chris M. Thomasson
2024-08-06 02:40:18 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Post by jseigh
Post by Lynn McGuire
"New features in C++26" By Daroc Alden
    https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence;
now that it's been more than a year since the finalization of C++23,
we have a good idea of what features could be adopted for C++26 —
although proposals can still be submitted until January 2025. Of
particular interest is the addition of support for hazard pointers
and user-space read-copy-update (RCU). Even though C++26 is not yet
a standard, many of the proposed features are already available to
experiment with in GCC or Clang."
Lynn
Speaking of hazard pointers, does anyone know where the idea of using
asymmetric memory barriers in them came from.
Afaict, from you, Joe. Back on comp.programming.threads.
Post by jseigh
  I think I know but I haven't found any independent confirmation of
it.  I know these are coming from the folly library but I got no
answers from them.
There isn't a lot of attribution in open source so you can't tell
where something came from or if it was independently developed.  I
know "split reference counting" is now folklore according to a cppcon
talk.
I know you have prior art for it.
Using RCU to get around the #StoreLoad membar in SMR loads is asymmetric
in a sense... ?
RCU grace periods, quiescent periods can all be used in an asymmetric
realm. We can artificially initiate them, or detect them. That is an
implementation detail, right'ish?
jseigh
2024-08-06 18:06:16 UTC
Permalink
Using RCU to get around the #StoreLoad membar in SMR loads is asymmetric in a sense... ?
RCU grace periods, quiescent periods can all be used in an asymmetric realm. We can artificially initiate them, or detect them. That is an implementation detail, right'ish?
I used term RCU because RCU used context switches as quiesce points and I was tracking them since context switches are effectively a memory barrier.

Linux membarrier() uses context switches (effected by IPI, interprocessor interrupts) in its implementation. That being the case,
if you were using RSEQ (restartable sequences) and wanted to do something like work stealing, i.e. moving something
from one processor's local storage to another processor's (the one doing the stealing) storage, you could use a call to
membarrier() after the move to make sure any ongoing operations by the former processor were complete before you access
the appropriated storage. I don't know how performant it would be, though.

Joe Seigh
Chris M. Thomasson
2024-08-06 20:13:56 UTC
Permalink
Post by jseigh
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Using RCU to get around the #StoreLoad membar in SMR loads is
asymmetric in a sense... ?
RCU grace periods, quiescent periods can all be used in an asymmetric
realm. We can artificially initiate them, or detect them. That is an
implementation detail, right'ish?
I used term RCU because RCU used context switches as quiesce points and
I was tracking them since context switches are effectively a memory
barrier.
This would be, in my terms, a "passive" way of detecting quiescent periods.
Post by jseigh
Linux membarrier() uses context switches (effected by IPI,
interprocessor interrupts) in its implementation.
Yup. This is how FlushProcessWriteBuffers() on Windows works as well. I
call this an "active" method because we are artificially executing a
remote memory barrier, so to speak.
Post by jseigh
That being the case,
if you were using RSEQ (restartable sequences) and wanted to do
something like work stealing, i.e. moving something
from one processor's local storage to another processor's (the one doing
the stealing) storage, you could use a call to
membarrier() after the move to make sure any ongoing operations by the
former processor were complete before you access
the appropriated storage.  I don't know how performant it would be, though.
If work-stealing is rare, it should not be that bad. Also, humm... Is
there a way to do the IPI to a "list" of processors? Aka, processors in
an affinity mask? So, the IPI does not have to be used for all of them?

Well, this comment from the MS side:

Remarks
The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It guarantees
the visibility of write operations performed on one processor to the
other processors.

Seems to do it...


Dmitry and I talked a lot about work stealing queues back on
comp.programming.threads. I had this one idea called work-requesting. He
wrote about it here:

https://www.1024cores.net/home/parallel-computing/concurrent-skip-list/work-stealing-vs-work-requesting

Need to try to find the old thread on c.p.t where I told him about it.

I remember doing some experiment where each thread had a hash table of
single producer multi consumer lifo's. A thread could push work into one
of its local lifo's. Another thread that had no local work to do could
flush the lifo of another thread that had some work. It's basically a
lock/wait-free stack using a single atomic swap to flush all of the
work. Iirc, it worked pretty good!
jseigh
2024-08-06 18:29:13 UTC
Permalink
There isn't a lot of attribution in open source so you can't tell where something came from or if it was independently developed.  I know "split reference counting" is now folklore according to a cppcon talk.
I tried to find that "folklore" comment. I thought it was in the cppcon 2023 talk
'Lock-free Atomic Shared Pointers Without a Split Reference Count? It Can Be Done! by Daniel Anderson'
but I couldn't find it again (it's an hour long talk). I did notice the split reference counting
technique described was slightly different than what I called differential reference counting.
It looks to me like split reference counting as described has an ABA problem.

Joe Seigh
jseigh
2024-08-06 22:58:14 UTC
Permalink
There isn't a lot of attribution in open source so you can't tell where something came from or if it was independently developed.  I know "split reference counting" is now folklore according to a cppcon talk.
I tried to find that "folklore" comment.  I thought it was in the cppcon 2023 talk
'Lock-free Atomic Shared Pointers Without a Split Reference Count? It Can Be Done! by Daniel Anderson'
but I couldn't find it again (it's an hour long talk).  I did notice the split reference counting
technique described was slightly different than what I called differential reference counting.
It looks to me like split reference counting as described has an ABA problem.
Or maybe not. I took a quick look at the folly code. You really can't easily read c++ code in that style
so I basically guessed a bit. It would be better described as cached local references, rather than split
reference counting. You increment and decrement the local ref count in the atomic shared pointer and just
update the global ref count if the atomic shared pointer is changed. When the local ref is dropped and
the atomic shared ref is different or the local ref count is zero, the global ref count is decremented instead.

It's all about which cache lines you are hitting and how often. If you have a lot of atomic shared pointers to
an object, then maybe "split" reference counting is better since you are splitting across more cache lines. If
you have just one atomic shared pointer then differential reference counting might be better. At any rate,
reference counting performs rather poorly compared to other methods of deferred reclamation.

Joe Seigh
Chris M. Thomasson
2024-08-06 23:41:00 UTC
Permalink
Post by jseigh
There isn't a lot of attribution in open source so you can't tell
where something came from or if it was independently developed.  I
know "split reference counting" is now folklore according to a cppcon
talk.
I tried to find that "folklore" comment.  I thought it was in the cppcon 2023 talk
'Lock-free Atomic Shared Pointers Without a Split Reference Count? It
Can Be Done! by Daniel Anderson'
but I couldn't find it again (it's an hour long talk).  I did notice
the split reference counting
technique described was slightly different than what I called
differential reference counting.
It looks to me like split reference counting as described has an ABA problem.
Or maybe not.  I took a quick look at the folly code.  You really can't
easily read c++ code in that style
so I basically guessed a bit.  It would be better described as cached
local references, rather than split
reference counting.  You increment and decrement the local ref count in
the atomic shared pointer and just
update the global ref count if the atomic shared pointer is changed.
When the local ref is dropped and
the atomic shared ref is different or the local ref count is zero, the
global ref count is decremented instead.
I should have some more time tonight to take a look. Wrt local
references, are they per-thread? I remember an old experiment that used
all local references. Iirc, even for strong reference counting. It used
hashing so collision was an interesting issue. This is when two
unrelated objects could use the same local counters when a thread hashed
them into its local reference table.
It's all about which cache lines you are hitting and how often.  If you
have a lot of atomic shared pointers to
an object, then maybe "split" reference counting is better since you are
splitting across more cache lines.  If
you have just one atomic shared pointer then differential reference
counting might be better.  At any rate,
reference counting performs rather poorly compared to other methods of
deferred reclamation.
Agreed.

Bonita Montero
2024-08-05 15:42:54 UTC
Permalink
Post by Lynn McGuire
"New features in C++26" By Daroc Alden
   https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we
have a good idea of what features could be adopted for C++26 — although
proposals can still be submitted until January 2025. Of particular
interest is the addition of support for hazard pointers and user-space
read-copy-update (RCU). Even though C++26 is not yet a standard, many
of the proposed features are already available to experiment with in
GCC or Clang."
Lynn
There will be updates from CppCon for sure.

But for C++23 this is a very good introduction to all new features:


Chris M. Thomasson
2024-08-05 20:39:29 UTC
Permalink
Post by Lynn McGuire
"New features in C++26" By Daroc Alden
   https://lwn.net/Articles/979870/
"ISO releases new C++ language standards on a three-year cadence; now
that it's been more than a year since the finalization of C++23, we have
a good idea of what features could be adopted for C++26 — although
proposals can still be submitted until January 2025. Of particular
interest is the addition of support for hazard pointers and user-space
read-copy-update (RCU).
That would be really interesting to me! hazard pointers aka (SMR) and
RCU in the standard? Wow. That would be neat. Just make sure that the
SMR has a possibility to remove the nasty #StoreLoad membar is has to
use in its original form, so to speak. Fwiw, Joe Seigh created SMR+RCU
to get rid of it... :^)
Post by Lynn McGuire
Even though C++26 is not yet a standard, many of
the proposed features are already available to experiment with in GCC or
Clang."
Lynn
Bonita Montero
2024-08-06 11:27:10 UTC
Permalink
Post by Chris M. Thomasson
That would be really interesting to me! hazard pointers aka (SMR) and
RCU in the standard? Wow. That would be neat. Just make sure that the
SMR has a possibility to remove the nasty #StoreLoad membar is has to
use in its original form, so to speak. Fwiw, Joe Seigh created SMR+RCU
to get rid of it... :^)
I don't like hazard pointers since you've to know the maximum number
of participating threads in advance to specify the number of hazard
pointers in the list. I'll stick with an atomic<shared_ptr<>> whose
lesser efficiency actually doesn't count.
Chris M. Thomasson
2024-08-06 19:52:49 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
That would be really interesting to me! hazard pointers aka (SMR) and
RCU in the standard? Wow. That would be neat. Just make sure that the
SMR has a possibility to remove the nasty #StoreLoad membar is has to
use in its original form, so to speak. Fwiw, Joe Seigh created SMR+RCU
to get rid of it... :^)
I don't like hazard pointers since you've to know the maximum number
of participating threads in advance to specify the number of hazard
pointers in the list. I'll stick with an atomic<shared_ptr<>> whose
lesser efficiency actually doesn't count.
No. You have to know the maximum number of hazard pointers an algorithm
has to use. Some use only one hazard pointer, some use more. It's not
about the number threads at all.
Bonita Montero
2024-08-06 21:00:31 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
That would be really interesting to me! hazard pointers aka (SMR) and
RCU in the standard? Wow. That would be neat. Just make sure that the
SMR has a possibility to remove the nasty #StoreLoad membar is has to
use in its original form, so to speak. Fwiw, Joe Seigh created
SMR+RCU to get rid of it... :^)
I don't like hazard pointers since you've to know the maximum number
of participating threads in advance to specify the number of hazard
pointers in the list. I'll stick with an atomic<shared_ptr<>> whose
lesser efficiency actually doesn't count.
No. You have to know the maximum number of hazard pointers an algorithm
has to use. Some use only one hazard pointer, some use more. It's not
about the number threads at all.
Each thread occupies one hazard pointer and more threads than pre
-allocated hazard pointers aren't possible.
Chris M. Thomasson
2024-08-06 23:34:01 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
That would be really interesting to me! hazard pointers aka (SMR)
and RCU in the standard? Wow. That would be neat. Just make sure
that the SMR has a possibility to remove the nasty #StoreLoad membar
is has to use in its original form, so to speak. Fwiw, Joe Seigh
created SMR+RCU to get rid of it... :^)
I don't like hazard pointers since you've to know the maximum number
of participating threads in advance to specify the number of hazard
pointers in the list. I'll stick with an atomic<shared_ptr<>> whose
lesser efficiency actually doesn't count.
No. You have to know the maximum number of hazard pointers an
algorithm has to use. Some use only one hazard pointer, some use more.
It's not about the number threads at all.
Each thread occupies one hazard pointer and more threads than pre
-allocated hazard pointers aren't possible.
Huh? That is a very odd thing to say. Humm... Let's say I wanted to use
a lock-free algorithm that uses three hazard pointers. This means that
each thread needs to have at least three hazard pointers. The number of
threads does not matter.
Loading...