Discussion:
xchg based thing...
(too old to reply)
Chris M. Thomasson
2023-11-27 06:14:52 UTC
Permalink
Fwiw, here is a quick and dirty mock up of my atomic exchange based lifo
work queue thing. Not sure how to classify it quite yet. I am going to
use it as a work "queue" for my parts of my CPU based rendering engine,
no need for it wrt my shader work. However, I needed to do it. It's
passing Relacy tests. That's a good thing. Now I need to add in a slow
path so it can wait in the kernel during periods of no work. I think an
eventcount should work for it. Or perhaps I might integrate some state
in the pointer values for futexes, or whatever. I have not made up my
mind yet.

I will port my Relacy test units over to standard C++. It's not all that
hard. In fact, it's rather straightforward.

Speaking of pointer values, take note of CT_PENDING.

The fun part about this scheme is that it only uses atomic exchange for
its logic. It also has wait-free loopless push and flush:


void push(work* w)
{
// this completes the pending logic...
w->m_next.store(CT_PENDING, rl::mo_relaxed, $);
work* head = m_head.exchange(w, rl::mo_release, $);
w->m_next.store(head, rl::mo_release, $);
}


work* flush()
{
return m_head.exchange(nullptr, rl::mo_acquire, $);
}


Here is my current Relacy test unit:

https://github.com/dvyukov/relacy
_______________________________
#include <iostream>
#include <relacy/relacy.hpp>


#define CT_PRODUCERS_N 2
#define CT_CONSUMERS_N 3
#define CT_THREAD_N (CT_PRODUCERS_N + CT_CONSUMERS_N)


// Notes:
// Using spin backoff for waits as of now
// Need to slow-path it with with kernel waits
// Need to refine the per-node backoff


// humm...
#define CT_PENDING ((ct_xchg_lifo_ver_001::work*)(0xDEADBEEF))

struct ct_xchg_lifo_ver_001
{

// simulate some work...
struct work
{
rl::atomic<work*> m_next;
VAR_T(unsigned long) m_work;


work()
: m_next(CT_PENDING),
m_work(0)
{

}


~work()
{
RL_ASSERT(VAR(m_work) == 1);
}


// no data races on m_work!
void execute_work()
{
VAR(m_work) += 1;
}


// this is the pending node logic...
work* get_next()
{
work* next = m_next.load(rl::mo_consume, $);

while (next == CT_PENDING)
{
rl::backoff();
next = m_next.load(rl::mo_consume, $);
}

return next;
}
};



rl::atomic<work*> m_head;

ct_xchg_lifo_ver_001()
: m_head(nullptr)
{

}

~ct_xchg_lifo_ver_001()
{
RL_ASSERT(! m_head.load(rl::mo_relaxed, $));
}



void push(work* w)
{
// this completes the pending logic...
w->m_next.store(CT_PENDING, rl::mo_relaxed, $);
work* head = m_head.exchange(w, rl::mo_release, $);
w->m_next.store(head, rl::mo_release, $);
}


work* flush()
{
return m_head.exchange(nullptr, rl::mo_acquire, $);
}


void process(work* cur)
{
// process work nodes...

while (cur)
{
// do real work _before_ the pending logic
// this is highly beneficial.
cur->execute_work();

// get the next work node.
cur = cur->get_next();
}
}


// dump worked on nodes...
void destroy(work* cur)
{
while (cur)
{
work* next = cur->m_next.load(rl::mo_relaxed, $);

// no pending work shall be destroyed!
RL_ASSERT(next != CT_PENDING);

delete cur;
cur = next;
}
}
};



struct ct_relacy_test_fun : rl::test_suite<ct_relacy_test_fun, CT_THREAD_N>
{
ct_xchg_lifo_ver_001 m_work_lifo;

void before()
{

}

void after()
{
ct_xchg_lifo_ver_001::work* w = m_work_lifo.flush();
m_work_lifo.process(w);
m_work_lifo.destroy(w);
}


void producer(unsigned int tidx)
{
ct_xchg_lifo_ver_001::work* w = new ct_xchg_lifo_ver_001::work();
m_work_lifo.push(w);
}


void consumer(unsigned int tidx)
{
ct_xchg_lifo_ver_001::work* w = m_work_lifo.flush();
m_work_lifo.process(w);
m_work_lifo.destroy(w);
}


void thread(unsigned int tidx)
{
if (tidx < CT_PRODUCERS_N)
{
// producer threads
producer(tidx);
}

else
{
// consumer threads
consumer(tidx);
}
}
};


int
main()
{
std::cout << "Exchange LIFO Container Experiment ver:0.0.1\n";
std::cout << "Relacy Unit Test ver:0.0.1\n";
std::cout << "by: Chris M. Thomasson\n";
std::cout << "__________________________________\n" << std::endl;

{
rl::test_params p;

p.iteration_count = 400000;
//p.execution_depth_limit = 33333;
//p.search_type = rl::sched_bound;
//p.search_type = rl::fair_full_search_scheduler_type;
//p.search_type = rl::fair_context_bound_scheduler_type;

std::cout << "Executing Relacy Unit Test...\n";
std::cout << "__________________________________" << std::endl;

rl::simulate<ct_relacy_test_fun>(p);
}

return 0;
}
_______________________________
Chris M. Thomasson
2023-11-27 06:20:31 UTC
Permalink
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based lifo
work queue thing. Not sure how to classify it quite yet. I am going to
use it as a work "queue" for my parts of my CPU based rendering engine,
no need for it wrt my shader work. However, I needed to do it. It's
passing Relacy tests. That's a good thing. Now I need to add in a slow
path so it can wait in the kernel during periods of no work. I think an
eventcount should work for it. Or perhaps I might integrate some state
in the pointer values for futexes, or whatever. I have not made up my
mind yet.
I will port my Relacy test units over to standard C++. It's not all that
hard. In fact, it's rather straightforward.
Speaking of pointer values, take note of CT_PENDING.
The fun part about this scheme is that it only uses atomic exchange for
    void push(work* w)
    {
        // this completes the pending logic...
        w->m_next.store(CT_PENDING, rl::mo_relaxed, $);
        work* head = m_head.exchange(w, rl::mo_release, $);
        w->m_next.store(head, rl::mo_release, $);
    }
    work* flush()
    {
        return m_head.exchange(nullptr, rl::mo_acquire, $);
    }
https://github.com/dvyukov/relacy
_______________________________
#include <iostream>
#include <relacy/relacy.hpp>
#define CT_PRODUCERS_N 2
#define CT_CONSUMERS_N 3
#define CT_THREAD_N (CT_PRODUCERS_N + CT_CONSUMERS_N)
// Using spin backoff for waits as of now
// Need to slow-path it with with kernel waits
// Need to refine the per-node backoff
// humm...
#define CT_PENDING ((ct_xchg_lifo_ver_001::work*)(0xDEADBEEF))
struct ct_xchg_lifo_ver_001
{
[...]
        // this is the pending node logic...
        work* get_next()
        {
            work* next = m_next.load(rl::mo_consume, $);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Now, I think I might be able to get away with using a relaxed membar
here instead of consume. I think so because of the acquire release
relation ship between the push and flush functions. This is were Relacy
can come in handy. I will have some more time to work on it later on
tonight. I need to test that. I don't think a consume membar is needed
here. I think relaxed should work fine. Not exactly sure right now.
Post by Chris M. Thomasson
            while (next == CT_PENDING)
            {
                rl::backoff();
                next = m_next.load(rl::mo_consume, $);
            }
            return next;
        }
    };
[...]
Chris M. Thomasson
2023-11-27 06:25:47 UTC
Permalink
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based lifo
work queue thing. Not sure how to classify it quite yet. I am going to
use it as a work "queue" for my parts of my CPU based rendering engine,
no need for it wrt my shader work. However, I needed to do it. It's
passing Relacy tests. That's a good thing. Now I need to add in a slow
path so it can wait in the kernel during periods of no work. I think an
eventcount should work for it. Or perhaps I might integrate some state
in the pointer values for futexes, or whatever. I have not made up my
mind yet.
I will port my Relacy test units over to standard C++. It's not all that
hard. In fact, it's rather straightforward.
Speaking of pointer values, take note of CT_PENDING.
The fun part about this scheme is that it only uses atomic exchange for
    void push(work* w)
    {
        // this completes the pending logic...
        w->m_next.store(CT_PENDING, rl::mo_relaxed, $);
        work* head = m_head.exchange(w, rl::mo_release, $);
        w->m_next.store(head, rl::mo_release, $);
    }
    work* flush()
    {
        return m_head.exchange(nullptr, rl::mo_acquire, $);
    }
https://github.com/dvyukov/relacy
_______________________________
[...]
Post by Chris M. Thomasson
    void push(work* w)
    {
        // this completes the pending logic...
        w->m_next.store(CT_PENDING, rl::mo_relaxed, $);
        work* head = m_head.exchange(w, rl::mo_release, $);
        w->m_next.store(head, rl::mo_release, $);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I also think I can use relaxed for that store! Humm... I just need to
test it. Tonight is going to be fun. :^)
Post by Chris M. Thomasson
    }
[...]
Chris M. Thomasson
2023-11-28 02:32:46 UTC
Permalink
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based lifo
work queue thing. Not sure how to classify it quite yet. I am going to
use it as a work "queue" for my parts of my CPU based rendering engine,
no need for it wrt my shader work. However, I needed to do it. It's
passing Relacy tests. That's a good thing. Now I need to add in a slow
path so it can wait in the kernel during periods of no work. I think an
eventcount should work for it. Or perhaps I might integrate some state
in the pointer values for futexes, or whatever. I have not made up my
mind yet.
I will port my Relacy test units over to standard C++. It's not all that
hard. In fact, it's rather straightforward.
Speaking of pointer values, take note of CT_PENDING.
The fun part about this scheme is that it only uses atomic exchange for
[...]

Fwiw, here is just a little exploration into distributing it. I will
make a home for it on GitHub. Check it out, my simple first venture into
distributing the work across this experiment wrt n-threads. Take careful
notice of local work vs. foreign work.


I managed to streamline the memory barriers to the weakest possible. Got
rid of a release membar, and a consume membar. I don't think it can get
any weaker without being incorrect.

___________________________
#include <iostream>
#include <cstddef>
#include <relacy/relacy.hpp>


#define CT_THREAD_N 5
#define CT_WORK_N 3
#define CT_PRODUCER_WORK_N 5
#define CT_CONSUMER_WORK_N 2
#define CT_DISTRIBUTE_WORK_N CT_THREAD_N


// Notes:
// Using spin backoff for waits as of now
// Need to slow-path it with with kernel waits
// Need to refine the per-node backoff


// humm...
#define CT_PENDING ((ct_xchg_lifo_ver_001::work*)(0xDEADBEEF))

static unsigned long g_stats_local_work = 0;
static unsigned long g_stats_foreign_work = 0;


struct ct_xchg_lifo_ver_001
{

// simulate some work...
struct work
{
rl::atomic<work*> m_next;
VAR_T(unsigned long) m_work;
VAR_T(unsigned long) m_tidx;


work(unsigned long tidx)
: m_next(CT_PENDING),
m_work(0),
m_tidx(tidx)
{

}


~work()
{
RL_ASSERT(VAR(m_work) == 1);
}


// no data races on m_work!
void execute_work(unsigned long tidx)
{
VAR(m_work) += 1;

unsigned long local_tidx = VAR(m_tidx);

if (local_tidx != tidx)
{
g_stats_foreign_work += 1;
/*
std::cout << "foriegn work detected "
<< local_tidx
<< " -> "
<< tidx
<< std::endl;
*/
}

else
{
g_stats_local_work += 1;

/*
std::cout << "local work detected "
<< local_tidx
<< " -> "
<< tidx
<< std::endl;
*/
}
}


// this is the pending node logic...
work* get_next()
{
work* next = m_next.load(rl::mo_relaxed, $);

while (next == CT_PENDING)
{
rl::backoff();
next = m_next.load(rl::mo_relaxed, $);
}

return next;
}



static void process(work* cur, unsigned long tidx)
{
// must not be pending!
RL_ASSERT(cur != CT_PENDING);

// process work nodes...

while (cur)
{
// do real work _before_ the pending logic
// this is highly beneficial... Big time!
cur->execute_work(tidx);

// get the next work node.
cur = cur->get_next();
}
}


// dump worked on nodes...
static void destroy(work* cur, unsigned long tidx)
{
// no pending work shall be destroyed!
RL_ASSERT(cur != CT_PENDING);

while (cur)
{
work* next = cur->m_next.load(rl::mo_relaxed, $);

// no pending work shall be destroyed!
RL_ASSERT(next != CT_PENDING);

delete cur;
cur = next;
}
}
};



rl::atomic<work*> m_head;

ct_xchg_lifo_ver_001()
: m_head(nullptr)
{

}

~ct_xchg_lifo_ver_001()
{
RL_ASSERT(! m_head.load(rl::mo_relaxed, $));
}



void push(work* w)
{
// this completes the pending logic...
w->m_next.store(CT_PENDING, rl::mo_relaxed, $);
work* head = m_head.exchange(w, rl::mo_release, $);
w->m_next.store(head, rl::mo_relaxed, $);
}


work* flush()
{
return m_head.exchange(nullptr, rl::mo_acquire, $);
}
};



// A test of a simple way to distribute work
// Using the ct_xchg_lifo_ver_001 API
template<typename T, std::size_t T_N>
struct ct_distribute_ver_001
{
T m_table[T_N];
typedef typename T::work work;


void push(work* w, unsigned int tidx)
{
T& t = m_table[tidx % T_N];
t.push(w);
}


work* pop(unsigned int tidx)
{
for (std::size_t i = 0; i < T_N; ++i)
{
T& t = m_table[(i + tidx) % T_N];
work* w = t.flush();
if (w) return w;
}

return nullptr;
}
};



// Relacy Test Unit...
struct ct_relacy_test_fun
: rl::test_suite<ct_relacy_test_fun, CT_THREAD_N>
{
typedef ct_distribute_ver_001<
ct_xchg_lifo_ver_001,
CT_DISTRIBUTE_WORK_N
Post by Chris M. Thomasson
distribute;
distribute m_work;


void thread(unsigned int tidx)
{
for (unsigned long wi = 0; wi < CT_WORK_N; ++wi)
{
for (unsigned long i = 0; i < CT_PRODUCER_WORK_N; ++i)
{
distribute::work* w = new distribute::work(tidx);
m_work.push(w, tidx);
}

for (unsigned long i = 0; i < CT_CONSUMER_WORK_N; ++i)
{
distribute::work* w = m_work.pop(tidx);
distribute::work::process(w, tidx);
distribute::work::destroy(w, tidx);
}
}
}
};


int
main()
{
std::cout << "Exchange LIFO Container Experiment ver:0.0.2\n";
std::cout << "Relacy Unit Test ver:0.0.2\n";
std::cout << "by: Chris M. Thomasson\n";
std::cout << "__________________________________\n" << std::endl;

{
rl::test_params p;

p.iteration_count = 1000000;
//p.execution_depth_limit = 33333;
//p.search_type = rl::sched_bound;
//p.search_type = rl::fair_full_search_scheduler_type;
//p.search_type = rl::fair_context_bound_scheduler_type;

std::cout << "Executing Relacy Unit Test...\n";
std::cout << "__________________________________" << std::endl;

rl::simulate<ct_relacy_test_fun>(p);


std::cout << "\nwork load "
<< g_stats_local_work
<< ", "
<< g_stats_foreign_work
<< std::endl;
}

return 0;
}
___________________________
Chris M. Thomasson
2023-11-28 02:34:22 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based
lifo work queue thing. Not sure how to classify it quite yet. I am
going to use it as a work "queue" for my parts of my CPU based
rendering engine, no need for it wrt my shader work. However, I needed
to do it. It's passing Relacy tests. That's a good thing. Now I need
to add in a slow path so it can wait in the kernel during periods of
no work. I think an eventcount should work for it. Or perhaps I might
integrate some state in the pointer values for futexes, or whatever. I
have not made up my mind yet.
I will port my Relacy test units over to standard C++. It's not all
that hard. In fact, it's rather straightforward.
Speaking of pointer values, take note of CT_PENDING.
The fun part about this scheme is that it only uses atomic exchange
[...]
Fwiw, here is just a little exploration into distributing it. I will
make a home for it on GitHub. Check it out, my simple first venture into
distributing the work across this experiment wrt n-threads. Take careful
notice of local work vs. foreign work.
I managed to streamline the memory barriers to the weakest possible. Got
rid of a release membar, and a consume membar. I don't think it can get
any weaker without being incorrect.
___________________________
#include <iostream>
#include <cstddef>
#include <relacy/relacy.hpp>
#define CT_THREAD_N 5
#define CT_WORK_N 3
#define CT_PRODUCER_WORK_N 5
#define CT_CONSUMER_WORK_N 2
#define CT_DISTRIBUTE_WORK_N CT_THREAD_N
// Using spin backoff for waits as of now
// Need to slow-path it with with kernel waits
// Need to refine the per-node backoff
// humm...
#define CT_PENDING ((ct_xchg_lifo_ver_001::work*)(0xDEADBEEF))
static unsigned long g_stats_local_work = 0;
static unsigned long g_stats_foreign_work = 0;
struct ct_xchg_lifo_ver_001
{
    // simulate some work...
    struct work
    {
        rl::atomic<work*> m_next;
        VAR_T(unsigned long) m_work;
        VAR_T(unsigned long) m_tidx;
        work(unsigned long tidx)
        :   m_next(CT_PENDING),
            m_work(0),
            m_tidx(tidx)
        {
        }
        ~work()
        {
            RL_ASSERT(VAR(m_work) == 1);
        }
        // no data races on m_work!
        void execute_work(unsigned long tidx)
        {
            VAR(m_work) += 1;
            unsigned long local_tidx = VAR(m_tidx);
            if (local_tidx != tidx)
            {
                g_stats_foreign_work += 1;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Fwiw, this is an out of bound stat that I am using. If I were keeping
stats in a real C++ impl, I would use split counters.
[...]
Chris M. Thomasson
2023-11-29 23:29:47 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based
lifo work queue thing. Not sure how to classify it quite yet. I am
going to use it as a work "queue" for my parts of my CPU based
rendering engine, no need for it wrt my shader work. However, I
needed to do it. It's passing Relacy tests. That's a good thing. Now
I need to add in a slow path so it can wait in the kernel during
periods of no work. I think an eventcount should work for it. Or
perhaps I might integrate some state in the pointer values for
futexes, or whatever. I have not made up my mind yet.
I will port my Relacy test units over to standard C++. It's not all
that hard. In fact, it's rather straightforward.
Speaking of pointer values, take note of CT_PENDING.
The fun part about this scheme is that it only uses atomic exchange
[...]
Fwiw, here is just a little exploration into distributing it. I will
make a home for it on GitHub. Check it out, my simple first venture
into distributing the work across this experiment wrt n-threads. Take
careful notice of local work vs. foreign work.
I managed to streamline the memory barriers to the weakest possible.
Got rid of a release membar, and a consume membar. I don't think it
can get any weaker without being incorrect.
___________________________
#include <iostream>
#include <cstddef>
#include <relacy/relacy.hpp>
#define CT_THREAD_N 5
#define CT_WORK_N 3
#define CT_PRODUCER_WORK_N 5
#define CT_CONSUMER_WORK_N 2
#define CT_DISTRIBUTE_WORK_N CT_THREAD_N
// Using spin backoff for waits as of now
// Need to slow-path it with with kernel waits
// Need to refine the per-node backoff
// humm...
#define CT_PENDING ((ct_xchg_lifo_ver_001::work*)(0xDEADBEEF))
static unsigned long g_stats_local_work = 0;
static unsigned long g_stats_foreign_work = 0;
struct ct_xchg_lifo_ver_001
{
     // simulate some work...
     struct work
     {
         rl::atomic<work*> m_next;
         VAR_T(unsigned long) m_work;
         VAR_T(unsigned long) m_tidx;
         work(unsigned long tidx)
         :   m_next(CT_PENDING),
             m_work(0),
             m_tidx(tidx)
         {
         }
         ~work()
         {
             RL_ASSERT(VAR(m_work) == 1);
         }
         // no data races on m_work!
         void execute_work(unsigned long tidx)
         {
             VAR(m_work) += 1;
             unsigned long local_tidx = VAR(m_tidx);
             if (local_tidx != tidx)
             {
                 g_stats_foreign_work += 1;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fwiw, this is an out of bound stat that I am using. If I were keeping
stats in a real C++ impl, I would use split counters.
[...]
Actually, I should refer to the stats as out-of-band state that is
outside of the Relacy unit test.
Chris M. Thomasson
2023-12-01 04:31:32 UTC
Permalink
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based
lifo work queue thing. Not sure how to classify it quite yet. [...]
        static void process(work* cur, unsigned long tidx)
        {
            // must not be pending!
            RL_ASSERT(cur != CT_PENDING);
            // process work nodes...
            while (cur)
            {
                // do real work _before_ the pending logic
                // this is highly beneficial... Big time!
                cur->execute_work(tidx);
                // get the next work node.
                cur = cur->get_next();
            }
        }
[...]
There are several ways to allow the next work to be passed to another
thread if the real work is compute intensive, so to speak.
Chris M. Thomasson
2023-12-02 21:26:11 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based
lifo work queue thing. Not sure how to classify it quite yet. [...]
         static void process(work* cur, unsigned long tidx)
         {
             // must not be pending!
             RL_ASSERT(cur != CT_PENDING);
             // process work nodes...
             while (cur)
             {
                 // do real work _before_ the pending logic
                 // this is highly beneficial... Big time!
                 cur->execute_work(tidx);
                 // get the next work node.
                 cur = cur->get_next();
             }
         }
[...]
There are several ways to allow the next work to be passed to another
thread if the real work is compute intensive, so to speak.
Another fun aspect of this is that it can be distributed in many
different ways. Keep in mind that it only using atomic exchange...
Chris M. Thomasson
2023-12-02 21:34:41 UTC
Permalink
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based lifo
work queue thing. Not sure how to classify it quite yet. I am going to
use it as a work "queue" for my parts of my CPU based rendering engine,
no need for it wrt my shader work. However, I needed to do it. It's
passing Relacy tests. That's a good thing. Now I need to add in a slow
path so it can wait in the kernel during periods of no work. I think an
eventcount should work for it. Or perhaps I might integrate some state
in the pointer values for futexes, or whatever. I have not made up my
mind yet.
I will port my Relacy test units over to standard C++. It's not all that
hard. In fact, it's rather straightforward.
Speaking of pointer values, take note of CT_PENDING.
The fun part about this scheme is that it only uses atomic exchange for
    void push(work* w)
    {
        // this completes the pending logic...
        w->m_next.store(CT_PENDING, rl::mo_relaxed, $);
        work* head = m_head.exchange(w, rl::mo_release, $);
        w->m_next.store(head, rl::mo_release, $);
    }
    work* flush()
    {
        return m_head.exchange(nullptr, rl::mo_acquire, $);
    }
https://github.com/dvyukov/relacy
_______________________________
#include <iostream>
#include <relacy/relacy.hpp>
#define CT_PRODUCERS_N 2
#define CT_CONSUMERS_N 3
#define CT_THREAD_N (CT_PRODUCERS_N + CT_CONSUMERS_N)
// Using spin backoff for waits as of now
// Need to slow-path it with with kernel waits
// Need to refine the per-node backoff
// humm...
#define CT_PENDING ((ct_xchg_lifo_ver_001::work*)(0xDEADBEEF))
struct ct_xchg_lifo_ver_001
{
    // simulate some work...
    struct work
    {
        rl::atomic<work*> m_next;
        VAR_T(unsigned long) m_work;
        work()
        :   m_next(CT_PENDING),
            m_work(0)
        {
        }
        ~work()
        {
            RL_ASSERT(VAR(m_work) == 1);
        }
        // no data races on m_work!
        void execute_work()
        {
            VAR(m_work) += 1;
        }
        // this is the pending node logic...
        work* get_next()
        {
            work* next = m_next.load(rl::mo_consume, $);
            while (next == CT_PENDING)
            {
                rl::backoff();
                next = m_next.load(rl::mo_consume, $);
            }
            return next;
        }
    };
    rl::atomic<work*> m_head;
    ct_xchg_lifo_ver_001()
    :   m_head(nullptr)
    {
    }
    ~ct_xchg_lifo_ver_001()
    {
        RL_ASSERT(! m_head.load(rl::mo_relaxed, $));
    }
    void push(work* w)
    {
        // this completes the pending logic...
        w->m_next.store(CT_PENDING, rl::mo_relaxed, $);
        work* head = m_head.exchange(w, rl::mo_release, $);
        w->m_next.store(head, rl::mo_release, $);
    }
    work* flush()
    {
        return m_head.exchange(nullptr, rl::mo_acquire, $);
    }
[...]

Another fun improvement is to allow push to insert a head and tail such
that it can insert many nodes at once. Still no CAS needed for this.
Also, flush can install a new list if it wants to. No CAS needed. ;^)

A goal of mine was to create a work system using atomic exchange only.
Chris M. Thomasson
2023-12-03 03:09:35 UTC
Permalink
[...]

I need to model a version of some real work in relacy, just for fun.
Imvvvho, an interesting question is how to gain an equipotential in 3d
space when there are an infinite number of them... One of my thoughts is
to gain three points in a normal field line, then use the surface normal
of the triangle for an equipotential. Seems like it works okay...
Chris M. Thomasson
2023-12-03 03:10:53 UTC
Permalink
Post by Chris M. Thomasson
[...]
I need to model a version of some real work in relacy, just for fun.
Imvvvho, an interesting question is how to gain an equipotential in 3d
space when there are an infinite number of them... One of my thoughts is
to gain three points in a normal field line, then use the surface normal
of the triangle for an equipotential. Seems like it works okay...
So, real work on a fine grain level wrt that type of work can consist of
three points and an epsilon. The surface normal would be returned. This
is fairly fine grain here...
Chris M. Thomasson
2023-12-03 03:11:52 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
[...]
I need to model a version of some real work in relacy, just for fun.
Imvvvho, an interesting question is how to gain an equipotential in 3d
space when there are an infinite number of them... One of my thoughts
is to gain three points in a normal field line, then use the surface
normal of the triangle for an equipotential. Seems like it works okay...
So, real work on a fine grain level wrt that type of work can consist of
three points and an epsilon. The surface normal would be returned. This
is fairly fine grain here...
Speaking of granularity, what bout the logic that derives the three
points via field line plotting? They can be in the same work node, so to
speak.
Chris M. Thomasson
2023-12-03 06:18:12 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Post by Chris M. Thomasson
[...]
I need to model a version of some real work in relacy, just for fun.
Imvvvho, an interesting question is how to gain an equipotential in
3d space when there are an infinite number of them... One of my
thoughts is to gain three points in a normal field line, then use the
surface normal of the triangle for an equipotential. Seems like it
works okay...
So, real work on a fine grain level wrt that type of work can consist
of three points and an epsilon. The surface normal would be returned.
This is fairly fine grain here...
Speaking of granularity, what bout the logic that derives the three
points via field line plotting? They can be in the same work node, so to
speak.
A thread checks the work system, gains some work and starts executing.
Humm... A real work item can tell it to compute n points of a field line
in a vector field. The work item might look like, just sketching this out:

<pseudo-code>
____________________
struct work_field_line_plot
{
// a dedicated portion of a canvas for
// this work item to plot on...
ct_canvas_section& m_canvas;

// say this work does not alter the
// vector field... so, its const
ct_vector_field const& m_field;

// line plotting data...
glm::vec4 m_origin;
glm::vec4 m_origin_radii;
float m_origin_angle;
float m_epsilon;
unsigned long m_step_n;


// get to work!
void execute()
{
// plot field lines on m_canvas...

float normal_step = 1.f / m_step_n;

// [...] Plotting logic...
}
};
____________________

Something along those lines. A work item has a constant reference to the
field and a non-constant reference to its portion of a canvas. Now,
should I give each work item its own independent canvas? I have did that
before and its nice because each work item can have its own rendering.
Combining all of them together makes the final image...

vs, each work item having its own place to plot in a single, shared
canvas? Humm... :^)
Chris M. Thomasson
2023-12-03 06:24:43 UTC
Permalink
Post by Chris M. Thomasson
[...]
I need to model a version of some real work in relacy, just for fun.
Imvvvho, an interesting question is how to gain an equipotential in 3d
space when there are an infinite number of them... One of my thoughts is
to gain three points in a normal field line, then use the surface normal
of the triangle for an equipotential. Seems like it works okay... >
Afaict, it would be a very strange experiment to allow for the field
lines to be plotted using a carefully constructed race condition in the
sense that its 100% legit in C++, but, the field lines would become
corrupted in "interesting" ways, no longer isolated from one another...
The work items can race with each other while building their lines...
Something along the lines of:

https://groups.google.com/g/comp.lang.c++/c/7u_rLgQe86k/m/fYU9SnuAFQAJ

Just thinking out loud here... Sorry for flooding. ;^o
Chris M. Thomasson
2024-01-15 21:38:52 UTC
Permalink
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based lifo
work queue thing. Not sure how to classify it quite yet. I am going to
use it as a work "queue" for my parts of my CPU based rendering engine,
no need for it wrt my shader work. However, I needed to do it. It's
passing Relacy tests. That's a good thing. Now I need to add in a slow
path so it can wait in the kernel during periods of no work. I think an
eventcount should work for it. Or perhaps I might integrate some state
in the pointer values for futexes, or whatever. I have not made up my
mind yet.
[...]

Got some nice test targets for the experimental xchg based "queue" of
mine to render:

Loading Image...

Loading Image...
Chris M. Thomasson
2024-01-27 05:55:32 UTC
Permalink
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based lifo
work queue thing. Not sure how to classify it quite yet. I am going to
[...]

Humm, should have some free time to do this... Might as well use it for
n-ary fields. 3d here:

Loading Image...
Chris M. Thomasson
2024-02-01 21:16:27 UTC
Permalink
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based lifo
work queue thing. Not sure how to classify it quite yet. I am going to
use it as a work "queue" for my parts of my CPU based rendering engine,
no need for it wrt my shader work. However, I needed to do it. It's
passing Relacy tests. That's a good thing. Now I need to add in a slow
path so it can wait in the kernel during periods of no work. I think an
eventcount should work for it. Or perhaps I might integrate some state
in the pointer values for futexes, or whatever. I have not made up my
mind yet.
I will port my Relacy test units over to standard C++. It's not all that
hard. In fact, it's rather straightforward.
[...]

Humm... Might be able to use it to quickly generate very complex models.
Reduce the loading screen time... This one is fairly complex and I can
get 60fps...

Loading Image...
Chris M. Thomasson
2024-02-01 21:17:45 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Fwiw, here is a quick and dirty mock up of my atomic exchange based
lifo work queue thing. Not sure how to classify it quite yet. I am
going to use it as a work "queue" for my parts of my CPU based
rendering engine, no need for it wrt my shader work. However, I needed
to do it. It's passing Relacy tests. That's a good thing. Now I need
to add in a slow path so it can wait in the kernel during periods of
no work. I think an eventcount should work for it. Or perhaps I might
integrate some state in the pointer values for futexes, or whatever. I
have not made up my mind yet.
I will port my Relacy test units over to standard C++. It's not all
that hard. In fact, it's rather straightforward.
[...]
Humm... Might be able to use it to quickly generate very complex models.
Reduce the loading screen time... This one is fairly complex and I can
get 60fps...
https://i.ibb.co/CBJqT8h/image.png
Loading Image...

Loading...