Discussion:
Signalling a condvar from inside and outside a locked mutex
(too old to reply)
Bonita Montero
2024-07-19 16:26:26 UTC
Permalink
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <condition_variable>
#include <sys/resource.h>

using namespace std;

int main( int argc, char ** )
{
constexpr size_t ROUNDS = 10'000;
bool outside = true;
struct state
{
bool signalled = false;
mutex mtx;
condition_variable cv;
} a, b;
a.signalled = true;
int64_t last;
do
{
atomic_uint64_t sumSwitches( 0 );
auto thr = [&]( state &me, state &you )
{
auto switches = [&]()
{
rusage ru;
if( getrusage( RUSAGE_THREAD, &ru ) )
terminate();
return ru.ru_nvcsw;
};
uint64_t before = switches();
for( size_t r = ROUNDS; r--; )
{
unique_lock myLock( me.mtx );
while( !me.signalled )
me.cv.wait( myLock );
me.signalled = false;
myLock.unlock();
unique_lock yoursLock( you.mtx );
you.signalled = true;
if( !outside )
you.cv.notify_one();
yoursLock.unlock();
if( outside )
you.cv.notify_one();
}
sumSwitches.fetch_add( switches() - before, memory_order_relaxed );
};
vector<jthread> threads;
threads.emplace_back( thr, ref( a ), ref( b ) );
threads.emplace_back( thr, ref( b ), ref( a ) );
threads.resize( 0 );
int64_t ss = sumSwitches.load( memory_order_relaxed );
cout << (outside ? "out" : "in") << "side: " << ss;
if( outside )
last = ss;
else
cout << " " << (100.0 * (double)ss / (double)last * 100 + 0.5) / 100
<< "%";
cout << endl;
} while( !(outside = !outside) );
return 0;
}

outside: 19937
inside: 20030 100.471%

Result: locking from inside and outside is as efficient with
Linux and glibc, only half a percent more contextswitches
with locking from inside.
Chris M. Thomasson
2024-07-19 18:37:27 UTC
Permalink
On 7/19/2024 9:26 AM, Bonita Montero wrote:
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)

Generally, signalling a condvar while the mutex is locked is not all
that efficient...
Scott Lurndal
2024-07-19 19:05:02 UTC
Permalink
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. It really
depends on what percentage of the time there is contention on the
mutex, how many waiters there are on average and the maturity of the
host thread scheduler.
Bonita Montero
2024-07-20 04:34:33 UTC
Permalink
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. ...
Show me the code that shows a difference.
Scott Lurndal
2024-07-20 14:30:12 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. ...
Show me the code that shows a difference.
I have far better things to do with my time than argue with you.
Bonita Montero
2024-07-20 18:24:20 UTC
Permalink
Post by Scott Lurndal
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. ...
Show me the code that shows a difference.
I have far better things to do with my time than argue with you.
I've correctly shown thar glibc is capable of waking up a condvar-waiter
and release the mutex in one step.
Chris M. Thomasson
2024-07-20 19:01:24 UTC
Permalink
Post by Bonita Montero
Post by Scott Lurndal
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. ...
Show me the code that shows a difference.
I have far better things to do with my time than argue with you.
I've correctly shown thar glibc is capable of waking up a condvar-waiter
and release the mutex in one step.
So?
Bonita Montero
2024-07-20 19:22:21 UTC
Permalink
Post by Bonita Montero
Post by Scott Lurndal
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. ...
Show me the code that shows a difference.
I have far better things to do with my time than argue with you.
I've correctly shown thar glibc is capable of waking up a condvar-waiter
and release the mutex in one step.
So?
He claims that it needs more complex contention to prove that
signalling from outside the locked phase is beneficial. I don't
know why this can't be proven by the code I've shown. So he
should show some code for his idea.
Chris M. Thomasson
2024-07-20 19:50:18 UTC
Permalink
Post by Bonita Montero
Post by Bonita Montero
Post by Scott Lurndal
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. ...
Show me the code that shows a difference.
I have far better things to do with my time than argue with you.
I've correctly shown thar glibc is capable of waking up a condvar-waiter
and release the mutex in one step.
So?
He claims that it needs more complex contention to prove that
signalling from outside the locked phase is beneficial. I don't
know why this can't be proven by the code I've shown. So he
should show some code for his idea.
You do know what wait morphing is, right?
Bonita Montero
2024-07-21 06:46:05 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Bonita Montero
Post by Scott Lurndal
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. ...
Show me the code that shows a difference.
I have far better things to do with my time than argue with you.
I've correctly shown thar glibc is capable of waking up a
condvar-waiter
and release the mutex in one step.
So?
He claims that it needs more complex contention to prove that
signalling from outside the locked phase is beneficial. I don't
know why this can't be proven by the code I've shown. So he
should show some code for his idea.
You do know what wait morphing is, right?
Ask this Scott.
Chris M. Thomasson
2024-07-21 20:12:56 UTC
Permalink
On 7/20/2024 11:46 PM, Bonita Montero wrote:
[...]
Post by Bonita Montero
Post by Chris M. Thomasson
You do know what wait morphing is, right?
Ask this Scott.
Oh, I am quite sure that Scott knows what wait morphing is.
Bonita Montero
2024-07-22 04:59:30 UTC
Permalink
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
Post by Chris M. Thomasson
You do know what wait morphing is, right?
Ask this Scott.
Oh, I am quite sure that Scott knows what wait morphing is.
Of course he doesn't know that, otherwise he wouldn't
think that more complex code would be necessary here.
Chris M. Thomasson
2024-07-22 19:12:04 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
Post by Chris M. Thomasson
You do know what wait morphing is, right?
Ask this Scott.
Oh, I am quite sure that Scott knows what wait morphing is.
Of course he doesn't know that, otherwise he wouldn't
think that more complex code would be necessary here.
Are you 100% sure about that? Scott Lurndal has a lot of experience, indeed.
Bonita Montero
2024-07-23 03:58:41 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
Post by Chris M. Thomasson
You do know what wait morphing is, right?
Ask this Scott.
Oh, I am quite sure that Scott knows what wait morphing is.
Of course he doesn't know that, otherwise he wouldn't
think that more complex code would be necessary here.
Are you 100% sure about that? Scott Lurndal has a lot of experience, indeed.
His answer was simply silly.
Chris M. Thomasson
2024-07-25 18:24:50 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
Post by Chris M. Thomasson
You do know what wait morphing is, right?
Ask this Scott.
Oh, I am quite sure that Scott knows what wait morphing is.
Of course he doesn't know that, otherwise he wouldn't
think that more complex code would be necessary here.
Are you 100% sure about that? Scott Lurndal has a lot of experience, indeed.
His answer was simply silly.
I would not put it quite that way. However, the experiments are
interesting, to me at least.

Chris M. Thomasson
2024-07-20 04:43:13 UTC
Permalink
Post by Scott Lurndal
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example. It really
depends on what percentage of the time there is contention on the
mutex, how many waiters there are on average and the maturity of the
host thread scheduler.
Right. It can be tested under simulated and/or recorded real load. I
used to do this in some of my older server code.
Chris M. Thomasson
2024-07-20 04:45:42 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
One cannot draw any conclusions from that toy example.  It really
depends on what percentage of the time there is contention on the
mutex, how many waiters there are on average and the maturity of the
host thread scheduler.
Right. It can be tested under simulated and/or recorded real load. I
used to do this in some of my older server code.
Turned out that some special algorithms were used during specific types
of load, so to speak... This was a long time ago. Back when I won the
the first part of the CoolThreads Contest. I got a T2000 SunFire server.
Fun Times!
Bonita Montero
2024-07-20 08:04:10 UTC
Permalink
One cannot draw any conclusions from that toy example. ...
Of course you can. glibc is capable to being awakened and unlocked
in one step.
Chris M. Thomasson
2024-07-25 00:23:26 UTC
Permalink
Post by Bonita Montero
One cannot draw any conclusions from that toy example. ...
Of course you can. glibc is capable to being awakened and unlocked
in one step.
I agree that certain tests that were deigned to maximize load in
interesting ways, are useful, indeed.
Chris M. Thomasson
2024-07-25 00:36:11 UTC
Permalink
Post by Bonita Montero
One cannot draw any conclusions from that toy example. ...
Of course you can. glibc is capable to being awakened and unlocked
in one step.
That is what wait morphing strives to do anyway?
Bonita Montero
2024-07-20 04:33:35 UTC
Permalink
Post by Chris M. Thomasson
[...]
Post by Bonita Montero
outside: 19937
inside: 20030 100.471%
Result: locking from inside and outside is as efficient with
Linux and  glibc, only half a percent more contextswitches
with locking from inside.
Well, wait morphing should have something to do with it. ;^)
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
I've shown that it is as efficient as waiting from outside.
Marcel Mueller
2024-07-21 12:14:57 UTC
Permalink
Post by Chris M. Thomasson
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
Is it?

At least Java and AFAIK .NET throw an exception wenn notify is called
w/o owning the mutex. pthread and std::condition_variable allow this
scenario.
Why this difference?

Are the implementations of notify_one really likely to be lock-free?


Marcel
Bonita Montero
2024-07-21 12:19:08 UTC
Permalink
Post by Marcel Mueller
At least Java and AFAIK .NET throw an exception wenn notify is called
w/o owning the mutex. pthread and std::condition_variable allow this
scenario.
Why this difference?
C++20 condvars and pthread condvars can be signalled without holding
a mutex. And some people think this is more efficient without having
proved that.
Monitors can be impemented more efficient than a mutex condvar combi-
nation but this results in that a monitor can't be signalled without
holding the "mutex"-part of the monitor.
Chris M. Thomasson
2024-07-21 20:08:25 UTC
Permalink
Post by Bonita Montero
Post by Marcel Mueller
At least Java and AFAIK .NET throw an exception wenn notify is called
w/o owning the mutex. pthread and std::condition_variable allow this
scenario.
Why this difference?
C++20 condvars and pthread condvars can be signalled without holding
a mutex. And some people think this is more efficient without having
proved that.
Huh? This is really old stuff here. Are you now just learning about it?
Post by Bonita Montero
Monitors can be impemented more efficient than a mutex condvar combi-
nation but this results in that a monitor can't be signalled without
holding the "mutex"-part of the monitor.
Chris M. Thomasson
2024-07-21 20:07:27 UTC
Permalink
Post by Marcel Mueller
Post by Chris M. Thomasson
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
Is it?
At least Java and AFAIK .NET throw an exception wenn notify is called
w/o owning the mutex. pthread and std::condition_variable allow this
scenario.
Why this difference?
Are the implementations of notify_one really likely to be lock-free?
There was a nice debate about it way back in comp.programming.threads.
Dave Butenhof was in the mix. PThreads allows for signalling outside of
the protection of the mutex. Afaict, it always has. Wait morphing is an
interesting way to try to make things more efficient when signalling
from inside of the mutex locked region.
Bonita Montero
2024-07-22 05:00:30 UTC
Permalink
Post by Chris M. Thomasson
Post by Marcel Mueller
Post by Chris M. Thomasson
Generally, signalling a condvar while the mutex is locked is not all
that efficient...
Is it?
At least Java and AFAIK .NET throw an exception wenn notify is called
w/o owning the mutex. pthread and std::condition_variable allow this
scenario.
Why this difference?
Are the implementations of notify_one really likely to be lock-free?
There was a nice debate about it way back in comp.programming.threads.
Dave Butenhof was in the mix. PThreads allows for signalling outside of
the protection of the mutex. Afaict, it always has. Wait morphing is an
interesting way to try to make things more efficient when signalling
from inside of the mutex locked region.
Ask his question and not what you think he asked.
Bonita Montero
2024-07-22 05:17:36 UTC
Permalink
Post by Chris M. Thomasson
There was a nice debate about it way back in comp.programming.threads.
Dave Butenhof was in the mix. PThreads allows for signalling outside of
the protection of the mutex. Afaict, it always has. Wait morphing is an
interesting way to try to make things more efficient when signalling
from inside of the mutex locked region.
Morphing has nothing to do with whether you signal the condvar
from the outside or the inside, it works in both cases.
Chris M. Thomasson
2024-07-22 19:13:26 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
There was a nice debate about it way back in comp.programming.threads.
Dave Butenhof was in the mix. PThreads allows for signalling outside
of the protection of the mutex. Afaict, it always has. Wait morphing
is an interesting way to try to make things more efficient when
signalling from inside of the mutex locked region.
Morphing has nothing to do with whether you signal the condvar
from the outside or the inside,
Yes, it does...
Post by Bonita Montero
it works in both cases.
It was "basically" meant to improve performance of all of those programs
that signal under the protection of the condvar.
Bonita Montero
2024-07-23 08:39:28 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
There was a nice debate about it way back in
comp.programming.threads. Dave Butenhof was in the mix. PThreads
allows for signalling outside of the protection of the mutex. Afaict,
it always has. Wait morphing is an interesting way to try to make
things more efficient when signalling from inside of the mutex locked
region.
Morphing has nothing to do with whether you signal the condvar
from the outside or the inside,
Yes, it does...
Post by Bonita Montero
it works in both cases.
It was "basically" meant to improve performance of all of those programs
that signal under the protection of the condvar.
I just wanted to prove that singalling from outside isn't necessary.
But some developers have rather some esoteric habits while developing.
For me it seemed unlikely that the glibc developers haven't handled
that optimally since I've seen a lot of excellent ideas from the
glibc. F.e. I suggested some improvement on the glibc mailing list
regarding fmod() any they adopted my idea, but added some further
improvements which lifed my idea to a totally new level.
Chris M. Thomasson
2024-07-23 19:49:29 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
There was a nice debate about it way back in
comp.programming.threads. Dave Butenhof was in the mix. PThreads
allows for signalling outside of the protection of the mutex.
Afaict, it always has. Wait morphing is an interesting way to try to
make things more efficient when signalling from inside of the mutex
locked region.
Morphing has nothing to do with whether you signal the condvar
from the outside or the inside,
Yes, it does...
Post by Bonita Montero
it works in both cases.
It was "basically" meant to improve performance of all of those
programs that signal under the protection of the condvar.
I just wanted to prove that singalling from outside isn't necessary.
Humm... Actually, I need to come up with some tests to get on par with
you. Might have some time later on tonight. The last time I benchmarked
it was around 2002-2009 ish iirc. So, a while back. I am hoping that the
new condvar impls perform a lot better then they used to... The thing is
that signalling from inside would automatically make waiters block,
broadcasting was pretty bad. But, wait morphing is meant to help out
with that. Iirc, futex impls even had some internal wait morphing
"techniques" wrt swapping wait queues and such...

If I remember correctly, some of my old tests were comparing different
things akin to:

<brief pseudo-code>
___________________
lock();
signal();
do_some_work();
unlock();
___________________


vs:
___________________
lock();
do_some_work();
signal();
unlock();
___________________


vs:
___________________
lock();
if (do_some_work()) signal();
unlock();
___________________


vs:
___________________
bool need_to_signal = false;

lock();
need_to_signal = do_some_work();
unlock();

if (need_to_signal) signal();
___________________


vs:
___________________
lock();
do_some_work();
unlock();

signal();
___________________


And other ones using broadcast. It was fairly elaborate. It was great
fun to test all of these on my SunFire T2000 that I won in the
CoolThreads from Sun. Right before the got acquired by Oracle. Btw, you
are making me reminisce about those times! Thanks for that Bonita.

:^)
Post by Bonita Montero
But some developers have rather some esoteric habits while developing.
For me it seemed unlikely that the glibc developers haven't handled
that optimally since I've seen a lot of excellent ideas from the
glibc. F.e. I suggested some improvement on the glibc mailing list
regarding fmod() any they adopted my idea, but added some further
improvements which lifed my idea to a totally new level.
Cool with me. :^)
Loading...