Discussion:
Self registering classes safe?
(too old to reply)
Marcel Mueller
2024-07-17 11:59:26 UTC
Permalink
static vector<const ET_File_Description*> ETFileDescription;

struct ET_File_Description
{
ET_File_Description()
{ ETFileDescription.push_back(this);
}

// several function pointers with file type specific handlers...
};


Spread over different compilation units the constructors are called:


const struct Asf_Description : ET_File_Description
{ Asf_Description()
{ // assign the function pointers...
}
}
ASF_Description();


Is this thread-safe? May the different static initializers run in parallel?

May a compiler elide the global variables since they are not used directly?


Marcel
Paavo Helde
2024-07-18 21:56:48 UTC
Permalink
Post by Marcel Mueller
static vector<const ET_File_Description*> ETFileDescription;
struct ET_File_Description
{
    ET_File_Description()
    {    ETFileDescription.push_back(this);
    }
    // several function pointers with file type specific handlers...
};
const struct Asf_Description : ET_File_Description
{   Asf_Description()
    {   // assign the function pointers...
    }
}
ASF_Description();
This declares a function, probably not what you wanted. Assuming you
wanted to define a static variable instead.
Post by Marcel Mueller
Is this thread-safe?
You publish the pointer in the global ETFileDescription before the most
derived object is constructed. Not a good idea in a multithreading
environment, the object should be published to other threads only after
it has been properly initialized. Also, if registration of objects can
be multithreaded, the push_back operation would need a mutex lock.

At least in the past global static initialization used the be
single-threaded and appeared before main(), but nowadays I'm not so
sure, the standard speaks about deferred initialization which can happen
after main() and in different threads.

Thread-safety is not the only one of your worries. A more serious
problem is that the global statics suffer from the static initialization
order fiasco, meaning that the global ETFileDescription in the first TU
is not guaranteed to be constructed before a static Asf_Description is
constructed in another TU.

Curiously, in case the implementation uses deferred initialization, it
is obliged to get the initialization order correct.

So, with some implementations you may suffer from initialization order
fiasco, and with other implementations you may suffer from
multithreading issues.

Suggesting to wrap ETFileDescription in a function as a local static,
this makes the behavior much more determined, it will be initialized by
the first call of the function.

Alternatively, I gather one can use C++20 modules and import
declarations to enforce proper order of translation units.
Post by Marcel Mueller
May the different static initializers run in parallel?
Looks like yes, at least if deferred initialization is used by the
implementation.
Post by Marcel Mueller
May a compiler elide the global variables since they are not used directly?
If deferred initialization is used, they might be never constructed
indeed, and I gather in this case the compiler may elide them indeed.

Even without deferred initialization, if the TU does not define any
symbols with external linkage, the whole TU can be discarded AFAIK,
together with the "unused" global variable.

I do not know if any current C++ implementation actually supports this
deferred initialization for non-local statics. Seems like a pretty
disruptive change.

In short, it looks like this is yet another case study of why global
statics are bad, and it looks like they have gone worse recently. Better
to avoid them, or at least make the registration functions explicit and
placed in the beginning of main().
Paavo Helde
2024-07-19 05:58:39 UTC
Permalink
Post by Paavo Helde
At least in the past global static initialization used the be
single-threaded and appeared before main(), but nowadays I'm not so
sure, the standard speaks about deferred initialization which can happen
after main() and in different threads.
I think the deferred initialization is mostly there to support dynamic
loading of shared libraries, and indeed these can be loaded in different
threads, but the standard seems very vague about this area. Especially
the example about A a and B b in [basic.start.dynamic] becomes more
confusing each time I reread it.
Marcel Mueller
2024-07-19 09:18:15 UTC
Permalink
Post by Paavo Helde
Post by Marcel Mueller
const struct Asf_Description : ET_File_Description
{   Asf_Description()
     {   // assign the function pointers...
     }
}
ASF_Description();
This declares a function, probably not what you wanted. Assuming you
wanted to define a static variable instead.
Argh. The original code had arguments. I removed them for simplification.
Post by Paavo Helde
Post by Marcel Mueller
Is this thread-safe?
You publish the pointer in the global ETFileDescription before the most
derived object is constructed. Not a good idea in a multithreading
environment, the object should be published to other threads only after
it has been properly initialized. Also, if registration of objects can
be multithreaded, the push_back operation would need a mutex lock.
In fact only static instances are permitted. And they are not used
before main().

However, there is another pitfall. The vector itself might not be
initialized. I already fixed this by using the objects itself as linked
list, omitting the vector.
Post by Paavo Helde
At least in the past global static initialization used the be
single-threaded and appeared before main(), but nowadays I'm not so
sure, the standard speaks about deferred initialization which can happen
after main() and in different threads.
AFAIK this applies to static variables inside functions.
Post by Paavo Helde
Thread-safety is not the only one of your worries. A more serious
problem is that the global statics suffer from the static initialization
order fiasco, meaning that the global ETFileDescription in the first TU
is not guaranteed to be constructed before a static Asf_Description is
constructed in another TU.
The sequece of the instances does not count in my case.
Post by Paavo Helde
Curiously, in case the implementation uses deferred initialization, it
is obliged to get the initialization order correct.
What should trigger the deferred initialization?
AFAIK all global variables MUST be initialized before main().
Post by Paavo Helde
Suggesting to wrap ETFileDescription in a function as a local static,
this makes the behavior much more determined, it will be initialized by
the first call of the function.
Indeed. This would be another option (to the linked list).
Post by Paavo Helde
Alternatively, I gather one can use C++20 modules and import
declarations to enforce proper order of translation units.
Unfortunately C++20 is not yet supported on all platforms.
Post by Paavo Helde
Post by Marcel Mueller
May the different static initializers run in parallel?
Looks like yes, at least if deferred initialization is used by the
implementation.
Do you have more details about deferred initialization?
I only found information about local statics.
Post by Paavo Helde
Post by Marcel Mueller
May a compiler elide the global variables since they are not used directly?
If deferred initialization is used, they might be never constructed
indeed, and I gather in this case the compiler may elide them indeed.
AFAIK this is forbidden by the standard.
I am just not sure whether the "const" may introduce a scope that allows
this.
Post by Paavo Helde
Even without deferred initialization, if the TU does not define any
symbols with external linkage, the whole TU can be discarded AFAIK,
together with the "unused" global variable.
Even if it is not a static library?
Post by Paavo Helde
In short, it looks like this is yet another case study of why global
statics are bad, and it looks like they have gone worse recently. Better
to avoid them, or at least make the registration functions explicit and
placed in the beginning of main().
Basically it is a kind of DI problem. The main application does not
depend on the file type handlers.

Once the application is transformed into a plug-in architecture the
problem will be gone. Then there is a defined initialization time: when
loading the plug-in.
Meanwhile I wanted to use this as intermediate solution. It worked for
me for a while but a user on another platform has problems which we
cound not track down for now.


Marcel
Paavo Helde
2024-07-19 14:46:13 UTC
Permalink
Post by Marcel Mueller
What should trigger the deferred initialization?
AFAIK all global variables MUST be initialized before main().
cppreference.com has a bit clearer verbiage than the standard, but not much.

https://en.cppreference.com/w/cpp/language/initialization

"All non-local variables with static storage duration are initialized as
part of program startup, before the execution of the main function
begins (unless deferred, see below)."

"It is implementation-defined whether dynamic initialization
happens-before the first statement of the main function (for statics)
[...], or deferred to happen after.

If the initialization of a non-inline variable(since C++17) is deferred
to happen after the first statement of main/thread function, it happens
before the first ODR-use of any variable with static/thread storage
duration defined in the same translation unit as the variable to be
initialized. If no variable or function is ODR-used from a given
translation unit, the non-local variables defined in that translation
unit may never be initialized (this models the behavior of an on-demand
dynamic library). However, as long as anything from a translation unit
is ODR-used, all non-local variables whose initialization or destruction
has side effects will be initialized even if they are not used in the
program."

The more I read, the more confusing it becomes. A dynamic library
typically consists of many TU-s, from here I gather if some of those
TU-s only contains a global static, it may still remain uninitialized
even when the library is loaded on demand.
Post by Marcel Mueller
Basically it is a kind of DI problem. The main application does not
depend on the file type handlers.
Once the application is transformed into a plug-in architecture the
problem will be gone. Then there is a defined initialization time: when
loading the plug-in.
If there are multiple threads running when loading the plugins, you
might need to add MT protection for the registration and lookup.

If you want to unload the plugins while still in multithreading regime,
then this would become even much more tricky.
Post by Marcel Mueller
Meanwhile I wanted to use this as intermediate solution. It worked for
me for a while but a user on another platform has problems which we
cound not track down for now.
That's the problem with static initialization order fiasco, sometimes it
can be accidentally correct, sometimes not.

I have burned myself with such things a while ago, and as a result now
I'm trying to avoid global variables as much as possible.
Marcel Mueller
2024-07-19 19:40:59 UTC
Permalink
Post by Paavo Helde
The more I read, the more confusing it becomes. A dynamic library
typically consists of many TU-s, from here I gather if some of those
TU-s only contains a global static, it may still remain uninitialized
even when the library is loaded on demand.
In fact it defeats any kind of self registering objects.
Post by Paavo Helde
Post by Marcel Mueller
Basically it is a kind of DI problem. The main application does not
depend on the file type handlers.
Once the application is transformed into a plug-in architecture the
when loading the plug-in.
If there are multiple threads running when loading the plugins, you
might need to add MT protection for the registration and lookup.
Of course. But typically plug-ins are loaded at application start only.
Post by Paavo Helde
If you want to unload the plugins while still in multithreading regime,
then this would become even much more tricky.
Indeed. This is likely never to happen for this application.


Marcel
Chris M. Thomasson
2024-07-20 02:59:27 UTC
Permalink
Post by Marcel Mueller
Post by Paavo Helde
The more I read, the more confusing it becomes. A dynamic library
typically consists of many TU-s, from here I gather if some of those
TU-s only contains a global static, it may still remain uninitialized
even when the library is loaded on demand.
In fact it defeats any kind of self registering objects.
Post by Paavo Helde
Post by Marcel Mueller
Basically it is a kind of DI problem. The main application does not
depend on the file type handlers.
Once the application is transformed into a plug-in architecture the
when loading the plug-in.
If there are multiple threads running when loading the plugins, you
might need to add MT protection for the registration and lookup.
Of course. But typically plug-ins are loaded at application start only.
It can be dynamic. Step 1 create the plug_in and all of its parts, 100%
ready and 100% initialized. Then, you can add it into your system as a
100% visible object, ready to roll, so to speak.

Not just at startup. A plugin can be loaded up at any time just as long
as its 100% initialized and ready to go... Then, and only then, can it
be exposed to your running system.
Post by Marcel Mueller
Post by Paavo Helde
If you want to unload the plugins while still in multithreading
regime, then this would become even much more tricky.
Indeed. This is likely never to happen for this application.
Marcel
Marcel Mueller
2024-07-20 11:16:42 UTC
Permalink
Post by Chris M. Thomasson
Post by Marcel Mueller
Of course. But typically plug-ins are loaded at application start only.
It can be dynamic. Step 1 create the plug_in and all of its parts, 100%
ready and 100% initialized. Then, you can add it into your system as a
100% visible object, ready to roll, so to speak.
Not just at startup. A plugin can be loaded up at any time just as long
as its 100% initialized and ready to go...
... and if the UI allows this.
Post by Chris M. Thomasson
Then, and only then, can it
be exposed to your running system.
It's not that easy. A plug-in may change the behavior of already running
actions. This could trigger race-conditions and of course semantic
problems too.


Marcel
Chris M. Thomasson
2024-07-20 19:00:39 UTC
Permalink
Post by Marcel Mueller
Post by Chris M. Thomasson
Post by Marcel Mueller
Of course. But typically plug-ins are loaded at application start only.
It can be dynamic. Step 1 create the plug_in and all of its parts,
100% ready and 100% initialized. Then, you can add it into your system
as a 100% visible object, ready to roll, so to speak.
Not just at startup. A plugin can be loaded up at any time just as
long as its 100% initialized and ready to go...
... and if the UI allows this.
Post by Chris M. Thomasson
Then, and only then, can it be exposed to your running system.
It's not that easy. A plug-in may change the behavior of already running
actions. This could trigger race-conditions and of course semantic
problems too.
A plug in 100% initialized and introduced into the system, ideally would
not do anything until it is used.
Marcel Mueller
2024-07-21 12:51:10 UTC
Permalink
Post by Chris M. Thomasson
A plug in 100% initialized and introduced into the system, ideally would
not do anything until it is used.
Of course, but the switch from not yet using the plug-in to using the
plug-in could be the critical part.


Marcel
Chris M. Thomasson
2024-07-21 20:20:48 UTC
Permalink
Post by Marcel Mueller
Post by Chris M. Thomasson
A plug in 100% initialized and introduced into the system, ideally
would not do anything until it is used.
Of course, but the switch from not yet using the plug-in to using the
plug-in could be the critical part.
Hopefully it would be atomic. Think of a list (perhaps even in a GUI) of
plug-in's. You add one. Internally it loads up the plug in, initializes
it... If all goes well it then gets atomically introduced into the
system. You get a new entry in the "list", 100% ready to roll. Fair
enough? Also, are you familiar with strong atomic reference counting?
Paavo Helde
2024-07-22 05:59:00 UTC
Permalink
Post by Chris M. Thomasson
Post by Marcel Mueller
Post by Chris M. Thomasson
A plug in 100% initialized and introduced into the system, ideally
would not do anything until it is used.
Of course, but the switch from not yet using the plug-in to using the
plug-in could be the critical part.
Hopefully it would be atomic. Think of a list (perhaps even in a GUI) of
plug-in's. You add one. Internally it loads up the plug in, initializes
it... If all goes well it then gets atomically introduced into the
system. You get a new entry in the "list", 100% ready to roll. Fair
enough? Also, are you familiar with strong atomic reference counting?
I think Marcel is talking about behavior changing. If some things were
done one way before loading the plugin, and another way after loading
the plugin, this may cause contradictions or inconsistencies in the
program state, regardless of whether the transition is atomic or not.

In my experience with plugins, some things would just fail to work
before loading the plugin, and succeeding after, which is much easier to
cope with, but I reckon there might be other usage scenarios which are
not so clear-cut.
Chris M. Thomasson
2024-07-22 19:15:44 UTC
Permalink
Post by Paavo Helde
Post by Chris M. Thomasson
Post by Marcel Mueller
Post by Chris M. Thomasson
A plug in 100% initialized and introduced into the system, ideally
would not do anything until it is used.
Of course, but the switch from not yet using the plug-in to using the
plug-in could be the critical part.
Hopefully it would be atomic. Think of a list (perhaps even in a GUI)
of plug-in's. You add one. Internally it loads up the plug in,
initializes it... If all goes well it then gets atomically introduced
into the system. You get a new entry in the "list", 100% ready to
roll. Fair enough? Also, are you familiar with strong atomic reference
counting?
I think Marcel is talking about behavior changing. If some things were
done one way before loading the plugin, and another way after loading
the plugin, this may cause contradictions or inconsistencies in the
program state, regardless of whether the transition is atomic or not.
In my experience with plugins, some things would just fail to work
before loading the plugin, and succeeding after, which is much easier to
cope with, but I reckon there might be other usage scenarios which are
not so clear-cut.
I see. One possible scenario. Trying to load a plug in that was created
for a more recent version of a program, perhaps it uses features that
are simply not present in older versions. Ideally, this plugin should
fail at the initialization phase? Such that it would not get added to
the system in any way, shape or form?

Chris M. Thomasson
2024-07-19 18:43:09 UTC
Permalink
On 7/19/2024 2:18 AM, Marcel Mueller wrote:
[...]
Post by Marcel Mueller
Once the application is transformed into a plug-in architecture the
problem will be gone. Then there is a defined initialization time: when
loading the plug-in.
Meanwhile I wanted to use this as intermediate solution. It worked for
me for a while but a user on another platform has problems which we
cound not track down for now.
Usually, after a plug in is fully loaded up and 100% initialized, then
it can be introduced into the system.
Chris M. Thomasson
2024-07-19 18:40:30 UTC
Permalink
Post by Paavo Helde
Post by Marcel Mueller
static vector<const ET_File_Description*> ETFileDescription;
struct ET_File_Description
{
     ET_File_Description()
     {    ETFileDescription.push_back(this);
     }
     // several function pointers with file type specific handlers...
};
const struct Asf_Description : ET_File_Description
{   Asf_Description()
     {   // assign the function pointers...
     }
}
ASF_Description();
This declares a function, probably not what you wanted. Assuming you
wanted to define a static variable instead.
Post by Marcel Mueller
Is this thread-safe?
You publish the pointer in the global ETFileDescription before the most
derived object is constructed. Not a good idea in a multithreading
environment, the object should be published to other threads only after
it has been properly initialized. Also, if registration of objects can
be multithreaded, the push_back operation would need a mutex lock.
[...]

For some reason it reminds me of a nasty race condition that was rather
common in base thread classes that I have had to debug. The base thread
would create a thread that worked on it, calling into the derived
object. This could occur _before_ the derived object was fully
constructed. It was a damn nightmare to debug.
Loading...