Why volatile may make sense for parallel code today.

Post by Bonita Montero
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
    constexpr size_t ROUNDS = 1'000'000;
    size_t volatile r = 1'000'000;
    jthread thr( [&]()
        {
            while( r )
                SleepEx( INFINITE, TRUE );
        } );
    for( size_t r = ROUNDS; r--; )
        QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

red floyd

2023-11-23 00:17:40 UTC

Post by Bonita Montero
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

Chris M. Thomasson

2023-11-23 05:05:13 UTC

Post by Bonita Montero
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

Chris M. Thomasson

2023-11-23 05:06:25 UTC

Post by Bonita Montero
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

Afaict, std::atomic should imply volatile? Right? If not, please correct me!

Marcel Mueller

2023-12-01 07:50:04 UTC

Post by Chris M. Thomasson
Afaict, std::atomic should imply volatile? Right? If not, please correct me!

In practice yes. But is this required by the standard? I could not find
any hint. Strictly speaking it is still required.

In fact memory ordering does not guarantee any particular time when the
change appears at another thread. So there is always some delay. But
could it be infinite? So the compiler could cache anything if no other
memory access according to the memory barrier is generated by the code?
This applies to read and write.
But I think it is almost impossible to write any reasonable code, that
causes no other memory access that forces the atomic value to be read or
written.

Marcel

David Brown

2023-11-23 09:02:58 UTC

Post by Bonita Montero
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

You will probably find that compilers in practice will re-read "r" each
round of the loop, regardless of the memory order. I am not convinced
this would be required for "relaxed", but compilers generally do not
optimise atomics as much as they are allowed to. They are, as far as I
have seen in my far from comprehensive testing, treated as though
"atomic" implied "volatile".

But as far as I can tell from the C and C++ standards, "atomic" does not
imply "volatile". There are situations where atomics can be "optimised"
- re-ordered with respect to other code, or simplified - while volatile
atomics cannot. I can see no reason why adjacent non-volatile relaxed
atomic reads of the same object cannot be combined, even if separated by
other code (with no volatile or atomic accesses). The same goes for
writes. If you have :

std::atomic<int> ax = 100;

...

x = 1;
x += 2;
x = x * x;

then you are guaranteed that any other thread reading "ax" will see
either the old value (100, if it was not changed), or the final value of
9. It /might/ also see values of 1 or 3 along the way, but there is no
requirement for the code to produce these intermediate values or for
them to be visible to other threads.

At least, that is how I interpret things. And I believe the fact that
the C and C++ standards make a distinction between atomics and volatile
atomics indicates that the standard authors do not see "atomic" as
implying the semantics of "volatile" - even if compiler writers choose
to act that way.

I personally thing it was a terrible mistake to mix sequencing and
ordering with atomics when multi-threading was introduced to the C and
C++ standards. Atomics would have been simpler, more efficient, and
consistent with their naming if their semantics had not included any
kind of synchronisation. Synchronisation and ordering is a very
different concept from atomic access, and should be covered differently
(by fences of various sorts).

Bonita Montero

2023-11-23 09:35:36 UTC

I am not convinced ....
... but compilers generally do not optimise atomics as much as they are allowed to.

Is there some kind of contradiction ?

Scott Lurndal

2023-11-23 19:55:27 UTC

Post by David Brown

Post by Bonita Montero
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
Â Â Â Â Â constexpr size_t ROUNDS = 1'000'000;
Â Â Â Â Â size_t volatile r = 1'000'000;
Â Â Â Â Â jthread thr( [&]()
Â Â Â Â Â Â Â Â {
Â Â Â Â Â Â Â Â Â Â Â Â while( r )
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â SleepEx( INFINITE, TRUE );
Â Â Â Â Â Â Â Â } );
Â Â Â Â Â for( size_t r = ROUNDS; r--; )
Â Â Â Â Â Â Â Â QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused.Â Does std:atomic imply "do not optimize access to this
variable"?Â Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

Linux tends to apply the volatile qualifier on the access, rather
than the definition.

#define ACCESS_ONCE(x) (*(volatile __typeof__(x) *)&(x))

while (ACCESS_ONCE(r)) {
}

Makes it rather obvious when reading the code what the intent
is, and won't be affected of someone accidentially removes the
volatile qualifier from the declaration of r.

Works just fine in c++, too.

David Brown

2023-11-24 09:08:35 UTC

Post by David Brown

Post by Bonita Montero
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

Linux tends to apply the volatile qualifier on the access, rather
than the definition.
#define ACCESS_ONCE(x) (*(volatile __typeof__(x) *)&(x))
while (ACCESS_ONCE(r)) {
}
Makes it rather obvious when reading the code what the intent
is, and won't be affected of someone accidentially removes the
volatile qualifier from the declaration of r.
Works just fine in c++, too.

That is often my preference too, since it is the access that is
"volatile" - a "volatile object" is simply one for which all accesses
are "volatile".

For the pedants, it might be worth noting that the "cast to pointer to
volatile" technique of ACCESS_ONCE is not actually guaranteed to be
treated as a volatile access in C until C17/C18 when the wording was
changed to talk about accesses via "volatile lvalues" rather than
accesses to objects declared as volatile. (When the topic was discussed
by the committee, everyone agreed that all known compiler vendors
treated "cast to pointer to volatile" accesses as volatile, so the
change was a formality rather than any practical difference.) I don't
know if and when this change was added to C++.

Richard Damon

2023-11-23 16:07:31 UTC

Post by Bonita Montero
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. SleepEx could (and should) establish time order, so the
compiler can't, in this case, optimize the read away.

Bonita Montero

2023-11-23 16:20:33 UTC

Post by Richard Damon
My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

Post by Richard Damon
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

Richard Damon

2023-11-23 17:50:50 UTC

... and the read is atomic - even if the trivial object is 1kB in size.

Yes, which has nothing to do with the question.

Post by Richard Damon
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.

If there can't be a write "before" the second read, that wasn't also
also "after" the first read, then there is no requirement to refetch the
data. In relaxed memory orders, just being physically before isn't
enough to be "before", but you need some explicit "barrier" to establish it.

I will admit this isn't an area I consider myself an expert in, but I
find no words that prohibit the optimization. The implementation does
need to consider possible action by other "threads" but only as far a
constrained by memory order, so two reads in the same ordering "slot"
are not forced.

Bonita Montero

2023-11-24 07:57:38 UTC

Post by Richard Damon
Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.

I don't believe that, think about an atomic flag that is periodically
polled. The compiler shouldn't cache that value.

Chris M. Thomasson

2023-11-24 08:16:07 UTC

I don't believe that, think about an atomic flag that is periodically
polled. The compiler shouldn't cache that value.

std::atomic is going to work for such a flag. Depending on your setup,
it should be using std::memory_order_relaxed for the polling.

Bonita Montero

2023-11-24 08:30:15 UTC

Post by Chris M. Thomasson
std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.

There's also atomic_flag, but it has some limitations over atomic_bool
that I've never used it. You can set it only in conjunction with an
atomic read and I never had a use for that. And this relies on a atomic
exchange, which costs a lot more than just a byte write.

Chris M. Thomasson

2023-11-24 09:04:50 UTC

Post by Chris M. Thomasson
std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.

Fwiw, this flag should be aligned on a l2 cache line boundary, and
padded up to a l2 cache line size.

Chris M. Thomasson

2023-11-24 09:05:44 UTC

Post by Chris M. Thomasson
std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.

Fwiw, this flag should be aligned on a l2 cache line boundary, and
padded up to a l2 cache line size.

You can stuff a cache line with words, as long as you do not straddle a
cache line boundary... YIKES!

David Brown

2023-11-24 09:23:00 UTC

Post by Richard Damon

Post by Richard Damon
My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

Yes, which has nothing to do with the question.

Post by Richard Damon
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.
If there can't be a write "before" the second read, that wasn't also
also "after" the first read, then there is no requirement to refetch the
data. In relaxed memory orders, just being physically before isn't
enough to be "before", but you need some explicit "barrier" to establish it.
I will admit this isn't an area I consider myself an expert in, but I
find no words that prohibit the optimization. The implementation does
need to consider possible action by other "threads" but only as far a
constrained by memory order, so two reads in the same ordering "slot"
are not forced.

That is exactly how I see it (I also do not consider myself an expert in
this area). I cannot see any requirement in the description of the
execution, covering sequencing, ordering, "happens before", and all the
rest, that suggests that the number of atomic accesses, or their order
amongst each other, or their order with respect to volatile accesses or
non-volatile accesses, is forced to follow the source code except where
the atomics have specific sequencing. Atomic accesses are not
"volatile" - they are not, in themselves, "observable behaviour".

Because the the sequencing requirements for atomics depends partly on
things happening in other threads, compilers are much more limited in
how they can re-order or otherwise optimise atomic accesses than they
are for normal accesses (unless the compiler knows all about the other
threads too!). Compilers must be pessimistic about optimisation. But
for certain simple cases, such as multiple neighbouring atomic reads of
the same address or multiple neighbouring writes to the same address, I
can't see any reason why they cannot be combined.

(Again, I am not an expert here - and I will be happy to be corrected.
They say the best way to learn something on the internet is not by
asking questions, but by writing something that is wrong!)

Chris M. Thomasson

2023-11-24 06:05:39 UTC

... and the read is atomic - even if the trivial object is 1kB in size.

humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?

Post by Richard Damon
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

Chris M. Thomasson

2023-11-24 06:53:24 UTC

Post by Richard Damon
My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?

How many words in your cache lines, say l2?

Post by Richard Damon
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

Bonita Montero

2023-11-24 07:53:49 UTC

Post by Chris M. Thomasson
humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?

Read that:
https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

Chris M. Thomasson

2023-11-24 09:06:47 UTC

https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

I know. Btw, what the hell happened to std::is_pod? ;^)

Bonita Montero

2023-11-24 09:10:13 UTC

https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

I know. Btw, what the hell happened to std::is_pod? ;^)

PODs are also trivial but go beyond since you can copy them
with a memcpy():

Chris M. Thomasson

2023-11-24 09:14:04 UTC

... and the read is atomic - even if the trivial object is 1kB in size.

How is that read atomic with 1kb of data? On what arch?

Post by Richard Damon
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

Chris M. Thomasson

2023-11-24 23:32:54 UTC

Post by Richard Damon
My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

How is that read atomic with 1kb of data? On what arch?

Unless you atomically read a pointer that points to 1kB of memory.

Post by Richard Damon
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

Bonita Montero

2023-11-23 07:08:13 UTC

Post by Chris M. Thomasson
std::atomic<size_t> r

The trick with my code is that the APC function is executed in the same
thread context as the function rpeatedly probing r as an end-indicator,
so I don't need atomic here. You should have known it better.

Kaz Kylheku

2023-11-23 08:26:39 UTC

Post by Chris M. Thomasson
std::atomic<size_t> r

The trick with my code is that the APC function is executed in the same
thread context as the function rpeatedly probing r as an end-indicator,

So what is "parallel" doing your subject line?

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Bonita Montero

2023-11-23 08:31:21 UTC

Post by Chris M. Thomasson
std::atomic<size_t> r

The trick with my code is that the APC function is executed in the same
thread context as the function rpeatedly probing r as an end-indicator,

So what is "parallel" doing your subject line?

The parallel part is injecting the APC with QueueUserAPC().
Interestingly there's some framework-code calling the function object
which I use for the thread code that brings the thread into an alertable
state so that the loop never loops because r is already zero.

Bonita Montero

2023-11-23 14:39:07 UTC

Post by Bonita Montero
Interestingly there's some framework-code calling the function object
which I use for the thread code that brings the thread into an alertable
state so that the loop never loops because r is already zero.

It's for sure no framework code: I've checked the code with a Win32
thread created through CreateThread() and my APCs are eaten up before
the thread's main function runs. Really strange.

Scott Lurndal

2023-11-23 19:56:27 UTC

Post by Chris M. Thomasson
std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

Bonita Montero

2023-11-24 04:26:11 UTC

Post by Chris M. Thomasson
std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

... with a nerd like you.

Chris M. Thomasson

2023-11-24 05:53:27 UTC

Post by Chris M. Thomasson
std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

... with a nerd like you.

Huh? What does that even mean? Really, humm... ;^o

Chris M. Thomasson

2023-11-24 05:59:17 UTC

Post by Chris M. Thomasson
std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

... with a nerd like you.

Do you secretly like nerds?

lol!

Chris M. Thomasson

2023-11-24 05:47:03 UTC

Post by Chris M. Thomasson
std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

Wow, no shit Scott. Yikes!

Kaz Kylheku

2023-11-23 20:32:33 UTC

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

Post by Bonita Montero
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?

BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".

I believe local functions in Pascal from 1971 can do this.

Bonita Montero

2023-11-24 04:27:24 UTC

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

Post by Bonita Montero
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?
BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".
I believe local functions in Pascal from 1971 can do this.

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

Chris M. Thomasson

2023-11-24 05:55:12 UTC

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

Post by Bonita Montero
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?
BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".
I believe local functions in Pascal from 1971 can do this.

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

Usually, wrong, or always right? humm...

David Brown

2023-11-24 09:35:49 UTC

Post by Bonita Montero
I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

You really believe that?

I think one of the (many) reasons people don't take you seriously is
that you never check your work. You invariably post code that is badly
wrong, followed by multiple replies to yourself making corrections and
improvements. Every time you claim your code is bug-free, we know you
will follow up shortly with a bug fix. Every time you claim it is
"perfect", we know that you will follow it with an "improved" version
("perfect" and "improved" being in your opinion only).

Yes, people have noticed. Yes, people will continue to notice.

It's nice that you post code, however, as it can start some interesting
discussions - before descending into a pantomime farce. But it might
make things a little better if you bothered to re-read your code before
posting, or even try testing it.

Kaz Kylheku

2023-11-24 18:27:37 UTC

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

Post by Bonita Montero
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?
BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".
I believe local functions in Pascal from 1971 can do this.

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals, it
makes the execution abruptly call an unrelated function and then resume
at the interrupted point.

The main difference is that the signal has a number, which selects a
registered handler, rather than specifying a function directly.

Why I bring this up is that ISO C (since 1990, I think), has specified a
use of a "volatile sig_atomic_t" type in regard to asynchronous signal
handlers. (Look it up.)

The use of volatile with interrupt-like mechanisms is nothing new.

Chris M. Thomasson

2023-11-24 23:37:26 UTC

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

Post by Bonita Montero
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?
BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".
I believe local functions in Pascal from 1971 can do this.

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals, it
makes the execution abruptly call an unrelated function and then resume
at the interrupted point.
The main difference is that the signal has a number, which selects a
registered handler, rather than specifying a function directly.
Why I bring this up is that ISO C (since 1990, I think), has specified a
use of a "volatile sig_atomic_t" type in regard to asynchronous signal
handlers. (Look it up.)
The use of volatile with interrupt-like mechanisms is nothing new.

After reading this, for some reason I am now thinking about signal safe
sync primitives in POSIX. Fwiw, certain pure lock/wait free algorithms
in signal handlers are okay.

Bonita Montero

2023-11-25 12:11:22 UTC

Post by Kaz Kylheku
Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals,
it makes the execution abruptly call an unrelated function and then
resume at the interrupted point.

I don't think so because APCs only can interrupt threads in an alertable
mode. Signals can interrupt nearly any code and they have implications
on the compiler's ABI through defining the size of the red zone. So com-
pared to signals APCs are rather clean, nevertheless you can do a lot of
interesting things with signals, as reported lately when I was informed
that mutexes with the glibc rely on signals; I gues it's the same when
a thread waits for a condition variable and a mutex at once.

Post by Kaz Kylheku
The main difference is that the signal has a number, which selects
a registered handler, rather than specifying a function directly.

The ugly thing with synchronous signals is that the signal handler is
global for all threads. You can concatenate them but the next signal
handler in the chain may be in a shared object already unloaded. I
think this should be corrected my making synchronous signals' handlers
thread-specific.

Post by Kaz Kylheku
The use of volatile with interrupt-like mechanisms is nothing new.

I think this pattern doesn't happen very often since it's rare that
a signal shares state with the interrupted code.

Chris M. Thomasson

2023-11-25 22:46:18 UTC