Discussion:
Immutable array with header
(too old to reply)
Marcel Mueller
2024-06-23 12:37:00 UTC
Permalink
Quite often I have the following requirement:

I need a data structure that holds a single header H together with N
elements E. N is constant but not constexpr so just using std::array is
not an option.

Use cases:
- A reference counted store for (small) images. The reference count, the
data size and maybe some meta information is the header while the
picture data is just raw storage.
- A reference counted string with refcount and size as header and an
array of the char type as data.

Normally this always requires two allocations for each object, one for
the header and one for the data. When holding millions of small objects
this can be a considerable overhead.

In good old C this was usually handled by

struct Storage
{ H Header;
E Data[];
};

But this does not work for C++ types with constructors/destructors.
new Storage() is not valid.

There is a dirty hack by writing a custom operator new that takes an
additional argument for N.

class Storage
{public:
H Header;
size_t n;
E Data[];
void* operator new(std::size_t sz, std::size_t N);
void operator delete(void* ptr);
};

The drawback is that ~Storage() must know N to call ~E(). In order to
work the operator new needs to store N somewhere in Storage which is not
the idea of an allocation operator.

Furthermore the size calculation in operator new needs offsetof(Storage,
Data), which is likely to be UB when Storage and H are no trivial types.
The size passed to operator new is not sufficient because the offset
might be different due to tail padding or tail optimization (e.g if E is
just char) or alignment requirements of E.


Any other ideas?


Marcel
Paavo Helde
2024-06-24 18:01:22 UTC
Permalink
Post by Marcel Mueller
I need a data structure that holds a single header H together with N
elements E. N is constant but not constexpr so just using std::array is
not an option.
- A reference counted store for (small) images. The reference count, the
data size and maybe some meta information is the header while the
picture data is just raw storage.
- A reference counted string with refcount and size as header and an
array of the char type as data.
Normally this always requires two allocations for each object, one for
the header and one for the data. When holding millions of small objects
this can be a considerable overhead.
If you allocate this dynamically, then you will access it by some kind
of smart pointer. Make the destruction logic part of the smart pointer
destructor, and you don't need to hack around with operator new/delete.

See the implementation of std::shared_ptr. Here, std::make_shared
allocates only one piece of memory and puts both the reference count and
the object there, using placement new. The std::shared_ptr destructor
will decrement the refcount and will eventually both destroy the object
and release the memory block. The only difference in your case is that
you need to allocate multiple objects instead of one. If the header only
consists of reference count, you should actually be fine with

auto shared = std::make_shared<E[]>(N);

(supported since C++20)

hth
Alf P. Steinbach
2024-06-24 19:49:41 UTC
Permalink
Post by Marcel Mueller
I need a data structure that holds a single header H together with N
elements E. N is constant but not constexpr so just using std::array is
not an option.
- A reference counted store for (small) images. The reference count, the
data size and maybe some meta information is the header while the
picture data is just raw storage.
- A reference counted string with refcount and size as header and an
array of the char type as data.
Normally this always requires two allocations for each object, one for
the header and one for the data. When holding millions of small objects
this can be a considerable overhead.
In good old C this was usually handled by
struct Storage
{ H Header;
  E Data[];
};
But this does not work for C++ types with constructors/destructors.
new Storage() is not valid.
There is a dirty hack by writing a custom operator new that takes an
additional argument for N.
class Storage
  H Header;
  size_t n;
  E Data[];
  void* operator new(std::size_t sz, std::size_t N);
  void operator delete(void* ptr);
};
The drawback is that ~Storage() must know N to call ~E(). In order to
work the operator new needs to store N somewhere in Storage which is not
the idea of an allocation operator.
Furthermore the size calculation in operator new needs offsetof(Storage,
Data), which is likely to be UB when Storage and H are no trivial types.
The size passed to operator new is not sufficient because the offset
might be different due to tail padding or tail optimization (e.g if E is
just char) or alignment requirements of E.
Any other ideas?
I'd make it non-copyable via construct & assign, and only creatable via
a factory function.

Which will pass the size both to the allocator and to the Storage
constructor. Which will need to placement-construct each Data item. And
ditto for destruction.

If copying is needed supply a cloning function.

Abstracting out (n, Data) as as a resuable thing appears to have some
additional complications.

But maybe that will be more clear later, and anyway doesn't need doing.


- Alf

Loading...