Discussion:
thread_local initialization
(too old to reply)
jseigh
2024-11-22 12:42:54 UTC
Permalink
Apparently class type thread_local variable with are initialized
dynamically, not at load time. This means every time the
thread_local variable is accessed, the code checks to see if
the variable needs initialization. This doesn't appear to be
the case for native types.

This caused the c++ version of smrproxy to go from about
0.6 nanoseconds for a lock()/unlock() operation in c to
about 3.2 nanoseconds.

Joe Seigh
Bonita Montero
2024-11-22 12:58:59 UTC
Permalink
Post by jseigh
Apparently class type thread_local variable with are initialized
dynamically, not at load time.  This means every time the
thread_local variable is accessed, the code checks to see if
the variable needs initialization.  This doesn't appear to be
the case for native types.
That's true for local thread_local variables, but only partitially.
The Theck is done if the code comes across the definition and not
on every access. That's similar to a static local variable; the
difference is only that the static variable is shared among all
threads.
I often use thread_local for temporary buffers in my own ultility
libraries. F.e. if I have a temporary vector I never shrink its
capacity to have as less allocations as possible.
jseigh
2024-11-22 17:10:50 UTC
Permalink
Post by jseigh
Apparently class type thread_local variable with are initialized
dynamically, not at load time.  This means every time the
thread_local variable is accessed, the code checks to see if
the variable needs initialization.  This doesn't appear to be
the case for native types.
This caused the c++ version of  smrproxy to go from about
0.6 nanoseconds for a lock()/unlock() operation in c to
about 3.2 nanoseconds.
And if the thread_local isn't accessed, no object is created and
thus no dtor is run, in case anyone wonders why dtors seem to not
always run on thread locals. Confirmed via testcase.

I verified the checks by a timing loop, 100,000,000. The object
may only be created once but it checks on every single access.
Also the generated code confirms that as well.

Joe Seigh
Bonita Montero
2024-11-22 19:01:32 UTC
Permalink
I verified the checks by a timing loop, 100,000,000.  The object
may only be created once but it checks on every single access.
Also the generated code confirms that as well.
No, the object is only created if the code comes across its definition,
and not on every access.
Chris M. Thomasson
2024-11-22 21:05:21 UTC
Permalink
Post by jseigh
Apparently class type thread_local variable with are initialized
dynamically, not at load time.  This means every time the
thread_local variable is accessed, the code checks to see if
the variable needs initialization.  This doesn't appear to be
the case for native types.
This caused the c++ version of  smrproxy to go from about
0.6 nanoseconds for a lock()/unlock() operation in c to
about 3.2 nanoseconds.
Shit. Humm...


thread_local struct ct_per_thread* ct_g_per_thread = nullptr;


ct_per_thread*
proxy_register_thread()
{
if (! ct_g_per_thread)
{
thread_local ct_per_thread l_ct_per_thread(_whatever_);
ct_g_per_thread = &l_ct_per_thread;
}

return ct_g_per_thread;
}


void proxy_lock()
{
ct_per_thread* per_thread = ct_g_per_thread;
assert(per_thread);
}


void proxy_unlock()
{
ct_per_thread* per_thread = ct_g_per_thread;
assert(per_thread);
}


If those asserts trip then it means that proxy_register_thread was not
called before them. Humm... Change the API to accept a pointer to a
ct_per_thread... ;^)

void proxy_lock(ct_per_thread* per_thread)
{

}


void proxy_unlock(ct_per_thread* per_thread)
{

}


void test()
{
ct_per_thread* per_thread = proxy_register_thread();

for (unsigned long i = 0; i < 1000000; ++i)
{
proxy_lock(per_thread);
//... do you thing!
proxy_unlock(per_thread);
}
}

The dtor of ct_per_thread would set ct_g_per_thread to 0?

For any registered thread, ct_g_per_thread is valid can can be accessed
in any function it calls.

Is that crap, or kind of crap?

It should work okay.
jseigh
2024-11-23 00:21:59 UTC
Permalink
Post by Chris M. Thomasson
Post by jseigh
Apparently class type thread_local variable with are initialized
dynamically, not at load time.  This means every time the
thread_local variable is accessed, the code checks to see if
the variable needs initialization.  This doesn't appear to be
the case for native types.
This caused the c++ version of  smrproxy to go from about
0.6 nanoseconds for a lock()/unlock() operation in c to
about 3.2 nanoseconds.
Shit. Humm...
thread_local struct ct_per_thread* ct_g_per_thread = nullptr;
ct_per_thread*
proxy_register_thread()
{
    if (! ct_g_per_thread)
    {
        thread_local ct_per_thread l_ct_per_thread(_whatever_);
        ct_g_per_thread = &l_ct_per_thread;
    }
    return ct_g_per_thread;
}
void proxy_lock()
{
    ct_per_thread* per_thread = ct_g_per_thread;
    assert(per_thread);
}
void proxy_unlock()
{
    ct_per_thread* per_thread = ct_g_per_thread;
    assert(per_thread);
}
If those asserts trip then it means that proxy_register_thread was not
called before them. Humm... Change the API to accept a pointer to a
ct_per_thread... ;^)
void proxy_lock(ct_per_thread* per_thread)
{
}
void proxy_unlock(ct_per_thread* per_thread)
{
}
void test()
{
    ct_per_thread* per_thread = proxy_register_thread();
    for (unsigned long i = 0; i < 1000000; ++i)
    {
        proxy_lock(per_thread);
            //... do you thing!
        proxy_unlock(per_thread);
    }
}
The dtor of ct_per_thread would set ct_g_per_thread to 0?
For any registered thread, ct_g_per_thread is valid can can be accessed
in any function it calls.
Is that crap, or kind of crap?
It should work okay.
An object would have dtors. Pointers to objects don't have dtors.
I was using unique_pointer so a dtor would run on the contained
object pointer. Unfortunately, the runtime check on every access
is too much. So now I have 2 thread locals.
unique_pointer<ref> to run the delete on the ref pointer and
ref* to access the ref pointer w/o the runtime check

Joe Seigh
Chris M. Thomasson
2024-11-23 00:58:25 UTC
Permalink
Post by Chris M. Thomasson
Post by jseigh
Apparently class type thread_local variable with are initialized
dynamically, not at load time.  This means every time the
thread_local variable is accessed, the code checks to see if
the variable needs initialization.  This doesn't appear to be
the case for native types.
This caused the c++ version of  smrproxy to go from about
0.6 nanoseconds for a lock()/unlock() operation in c to
about 3.2 nanoseconds.
Shit. Humm...
thread_local struct ct_per_thread* ct_g_per_thread = nullptr;
ct_per_thread*
proxy_register_thread()
{
     if (! ct_g_per_thread)
     {
         thread_local ct_per_thread l_ct_per_thread(_whatever_);
         ct_g_per_thread = &l_ct_per_thread;
The thread local ct_per_thread being created here will have its dtor
called after the thread is shutdown and before its successfully joined.
Post by Chris M. Thomasson
     }
     return ct_g_per_thread;
}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


[...]
Post by Chris M. Thomasson
It should work okay.
An object would have dtors.  Pointers to objects don't have dtors.
Take a closer look at proxy_register_thread()? The ct_per_thread objects
will get their dtors called. Iirc, during atexit handler or something.
I was using unique_pointer so a dtor would run on the contained
object pointer.   Unfortunately, the runtime check on every access
is too much.  So now I have 2 thread locals.
  unique_pointer<ref> to run the delete on the ref pointer and
  ref* to access the ref pointer w/o the runtime check
Bonita Montero
2024-11-23 09:40:29 UTC
Permalink
Post by Chris M. Thomasson
thread_local struct ct_per_thread* ct_g_per_thread = nullptr;
He was talking about local variables which are thread_local.
Your thread_local variable is global and thereby initialized
before a thread starts.
Chris M. Thomasson
2024-11-23 10:13:37 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
thread_local struct ct_per_thread* ct_g_per_thread = nullptr;
He was talking about local variables which are thread_local.
Your thread_local variable is global and thereby initialized
before a thread starts.
I do have a local variable that is thread_local, did you see the
following function?:
_______________________
ct_per_thread*
proxy_register_thread()
{
if (! ct_g_per_thread)
{
thread_local ct_per_thread l_ct_per_thread(_whatever_);
ct_g_per_thread = &l_ct_per_thread;
}

return ct_g_per_thread;
}
_______________________

I posted full code that tests it out in the "base code for a proxy
experiment..." thread.

Loading...