Discussion:
Idea for spin-wait loops
(too old to reply)
Bonita Montero
2024-03-23 16:53:40 UTC
Permalink
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
Chris M. Thomasson
2024-03-23 20:52:49 UTC
Permalink
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
Chris M. Thomasson
2024-03-23 20:58:02 UTC
Permalink
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
MWAIT?
Bonita Montero
2024-03-24 06:38:02 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
MWAIT?
MWAIT has no timeout.
Chris M. Thomasson
2024-03-26 02:48:27 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
MWAIT?
MWAIT has no timeout.
Not sure how important it would be for MWAIT to have a timeout... You
are referring to user space, right?
Bonita Montero
2024-03-26 10:12:07 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
MWAIT?
MWAIT has no timeout.
Not sure how important it would be for MWAIT to have a timeout... You
are referring to user space, right?
MWAIT could be used for limited spinning like glibc's pthread_mutex
is capable. The advantage of a MWAIT with timout would be much less
interconnect-traffic compared to polling.
Chris M. Thomasson
2024-03-26 20:02:47 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
MWAIT?
MWAIT has no timeout.
Not sure how important it would be for MWAIT to have a timeout... You
are referring to user space, right?
MWAIT could be used for limited spinning like glibc's pthread_mutex
is capable. The advantage of a MWAIT with timout would be much less
interconnect-traffic compared to polling.
MWAIT is meant to get around polling?
Bonita Montero
2024-03-26 20:23:07 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
MWAIT?
MWAIT has no timeout.
Not sure how important it would be for MWAIT to have a timeout... You
are referring to user space, right?
MWAIT could be used for limited spinning like glibc's pthread_mutex
is capable. The advantage of a MWAIT with timout would be much less
interconnect-traffic compared to polling.
MWAIT is meant to get around polling?
MWAIT could replace polling / spinning on a mutex for a limited
time if it would have a timeout.
Chris M. Thomasson
2024-03-26 20:30:45 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
MWAIT?
MWAIT has no timeout.
Not sure how important it would be for MWAIT to have a timeout...
You are referring to user space, right?
MWAIT could be used for limited spinning like glibc's pthread_mutex
is capable. The advantage of a MWAIT with timout would be much less
interconnect-traffic compared to polling.
MWAIT is meant to get around polling?
MWAIT could replace polling / spinning on a mutex for a limited
time if it would have a timeout.
So, you timeout, check some other stuff, then wait again. Still sounds
like polling?
Chris M. Thomasson
2024-03-26 20:31:24 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
MWAIT?
MWAIT has no timeout.
Not sure how important it would be for MWAIT to have a timeout...
You are referring to user space, right?
MWAIT could be used for limited spinning like glibc's pthread_mutex
is capable. The advantage of a MWAIT with timout would be much less
interconnect-traffic compared to polling.
MWAIT is meant to get around polling?
MWAIT could replace polling / spinning on a mutex for a limited
time if it would have a timeout.
So, you timeout, check some other stuff, then wait again. Still sounds
like polling?
Sounds like you want a hardware based futex.
Bonita Montero
2024-03-27 09:18:47 UTC
Permalink
Post by Chris M. Thomasson
So, you timeout, check some other stuff, then wait again.
Still sounds like polling?
The checks only would occur if the cacheline containing the
word actually was modified.
Michael S
2024-03-27 15:09:57 UTC
Permalink
On Tue, 26 Mar 2024 13:02:47 -0700
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
MWAIT?
MWAIT has no timeout.
Not sure how important it would be for MWAIT to have a timeout...
You are referring to user space, right?
MWAIT could be used for limited spinning like glibc's pthread_mutex
is capable. The advantage of a MWAIT with timout would be much less
interconnect-traffic compared to polling.
MWAIT is meant to get around polling?
I don't know what you mean by 'get around'.
The main point of original Monitor/MWAIT is to allow to one SMT thread
to do polling on memory address in a way that consumes almost no core's
execution resources thus allowing to the other SMT thread(s) of the
same core to run faster. The sort of more intelligent PAUSE.
In the absence of other SMT threads the main advantage of polling
loop with Monitor/MWAIT vs simple tight polling loop (STPL) is reduced
power consumption.
As far as cache coherence traffic (CCT) is concerned, Monitor/MWAIT
polling loop provides virtually no advantage relatively to STPL. Both
are quite efficient from CCT perspective, at least as long as programmer
does not do anything stupid.

Later on Intel invented 'MWAIT for Power Management' that has slightly
different objectives. But that is O.T.
Chris M. Thomasson
2024-03-27 19:58:50 UTC
Permalink
Post by Michael S
On Tue, 26 Mar 2024 13:02:47 -0700
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Bonita Montero
Post by Chris M. Thomasson
MWAIT?
MWAIT has no timeout.
Not sure how important it would be for MWAIT to have a timeout...
You are referring to user space, right?
MWAIT could be used for limited spinning like glibc's pthread_mutex
is capable. The advantage of a MWAIT with timout would be much less
interconnect-traffic compared to polling.
MWAIT is meant to get around polling?
I don't know what you mean by 'get around'.
Turing a "hot" spin wait into a cooler one...

;^)
Post by Michael S
The main point of original Monitor/MWAIT is to allow to one SMT thread
to do polling on memory address in a way that consumes almost no core's
execution resources thus allowing to the other SMT thread(s) of the
same core to run faster. The sort of more intelligent PAUSE.
In the absence of other SMT threads the main advantage of polling
loop with Monitor/MWAIT vs simple tight polling loop (STPL) is reduced
power consumption.
As far as cache coherence traffic (CCT) is concerned, Monitor/MWAIT
polling loop provides virtually no advantage relatively to STPL. Both
are quite efficient from CCT perspective, at least as long as programmer
does not do anything stupid.
Later on Intel invented 'MWAIT for Power Management' that has slightly
different objectives. But that is O.T.
Indeed.

Bonita Montero
2024-03-24 06:37:33 UTC
Permalink
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
Not all kinds of mutexes can be done with a futex.
Chris M. Thomasson
2024-03-24 19:33:42 UTC
Permalink
Post by Bonita Montero
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
the cacheline containing the word is modified or there's a timeout
according to the timestamp-counter's value.
This would eliminate active spinning and polling a value in memory.
Polling would occur only if the cacheline would be modified by
another thread.
futex
Not all kinds of mutexes can be done with a futex.
Have you ever heard of an asymmetric mutex?
Scott Lurndal
2024-03-24 20:43:37 UTC
Permalink
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
A processor which is doesn't own (or have a shared copy) of the
cacheline which would contain that word in memory will never know
if it was modified, as it won't see the invalidate messages in
a directory-based cache subsystem (leaving aside noncachable
accesses to the word in memory, of course).

This sounds like a solution to a problem that doesn't exist,
and there would be no incentive for a processor designer
to include the substantial additional complexity required
to support your feature.
Bonita Montero
2024-03-25 06:23:14 UTC
Permalink
Post by Scott Lurndal
This sounds like a solution to a problem that doesn't exist,
and there would be no incentive for a processor designer
to include the substantial additional complexity required
to support your feature.
MONITOR / MWAIT is nearly the same except for the timeout.
Michael S
2024-03-25 12:34:50 UTC
Permalink
On Sun, 24 Mar 2024 20:43:37 GMT
Post by Scott Lurndal
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
A processor which is doesn't own (or have a shared copy) of the
cacheline which would contain that word in memory will never know
if it was modified, as it won't see the invalidate messages in
a directory-based cache subsystem (leaving aside noncachable
accesses to the word in memory, of course).
It seems, I didn't understand the idea.
Of course, the waiting thread/core has the word in question in its
L1D cache when it enters the wait loop.
Of course, it is awaken if/when the the word is evicted from the cache
for unrelated reason, i.e. practically because of capacity conflict
caused by activity of other threads that are running on the same
core. There is nothing wrong with spurious awakenings as long as they
are rare.
Post by Scott Lurndal
This sounds like a solution to a problem that doesn't exist,
and there would be no incentive for a processor designer
to include the substantial additional complexity required
to support your feature.
The problem does exist and primitive proposed by Bonita is not new. It
is a minor modification of Monitor/Mwait.
For current Intel and AMD processors this sort of things is
relatively unattractive because at 2 threads per core and with rather
measurable throughput gains achieved by running 2 threads instead of
one (for AMD up to 30%, for Intel a little less, but often measurable),
each thread is a valuable resource, so you don't really want to keep it
paused for too long time. And the whole point of Bonita's amendment of
existing mechanism is that the software has more control on long waits.

On IBM POWER and on few of Sun/Oracle chips they have up to 8 threads
per core, so each thread is not that valuable. It means that longer
uninterrupted wait has more sense and control of duration of the
timeout is more desirable. But may be IBM's and Oracle's variants of
MWAIT already have it?
Michael S
2024-03-25 17:11:22 UTC
Permalink
On Mon, 25 Mar 2024 14:34:50 +0200
Post by Michael S
On Sun, 24 Mar 2024 20:43:37 GMT
Post by Scott Lurndal
Post by Bonita Montero
I've got a nice idea for a new processor-extrension for spin-wait
-loops. The idea is that a thread of a processors enters a sleep
state if a word in memory is equal to a certain register until
A processor which is doesn't own (or have a shared copy) of the
cacheline which would contain that word in memory will never know
if it was modified, as it won't see the invalidate messages in
a directory-based cache subsystem (leaving aside noncachable
accesses to the word in memory, of course).
It seems, I didn't understand the idea.
I meant to say 'you' instead of 'I'.
Bonita Montero
2024-03-25 17:53:52 UTC
Permalink
Post by Michael S
The problem does exist and primitive proposed by Bonita is not new.
It is a minor modification of Monitor/Mwait.
Functionally the modification is minor, but the effect would be
major since the cache-interconnect traffic would be minimized.
Chris M. Thomasson
2024-03-26 20:13:58 UTC
Permalink
Post by Bonita Montero
Post by Michael S
The problem does exist and primitive proposed by Bonita is not new.
It is a minor modification of Monitor/Mwait.
Functionally the modification is minor, but the effect would be
major since the cache-interconnect traffic would be minimized.
Ask over in comp.arch
Loading...