Discussion:
Returning small strings as std::array
(too old to reply)
Marcel Mueller
2024-03-09 10:09:12 UTC
Permalink
Is it reasonable to return small strings a std::array to avoid copying?

std::string might require allocation if it has no small string
optimization build in. Furthermore it cannot be initialized from old C
style APIs that require char* buffer and size_t buffer_size.

The idea is to return std::array<char,10> or something like that. This
causes no allocation and the compiler should be able to optimize the
return value to emplace the result into the callers storage.

Any other idea?


Marcel
Bonita Montero
2024-03-09 15:23:07 UTC
Permalink
Post by Marcel Mueller
Is it reasonable to return small strings a std::array to avoid copying?
std::string might require allocation if it has no small string
optimization build in. Furthermore it cannot be initialized from old C
style APIs that require char* buffer and size_t buffer_size.
The idea is to return std::array<char,10> or something like that. This
causes no allocation and the compiler should be able to optimize the
return value to emplace the result into the callers storage.
Any other idea?
Marcel
C++ strings can store small strings internally. For MSVC,
clang / libc++ and g++ / libstdc++ the maximum string length
that can be stored this way is 16.
The following program prints the size stored internally with
your implementation:

#include <iostream>
#include <algorithm>

using namespace std;

int main()
{
size_t length = 1;
string from, to;
char const *src, *dst;
do
{
from.resize( 0 );
fill_n( back_inserter( from ), length++, '*' );
src = from.data();
to = move( from );
dst = to.data();

} while( src != dst );
cout << --length << endl;
}

Iterators to basic_string objects are allowed to change when you move
them (other containers doesn't allow this), so also their addresses.
I simply construct a increasingly large string and look if the star-
ting address varies on moving. If it is equal you've got an externally
stored string and the size which can be stored internally is one less.
Paavo Helde
2024-03-10 14:35:57 UTC
Permalink
Post by Marcel Mueller
Is it reasonable to return small strings a std::array to avoid copying?
No, I believe it's not reasonable as it would complicate the code and
make it less readable, probably without any measurable benefit whatsoever.
Post by Marcel Mueller
std::string might require allocation if it has no small string
optimization build in.
All mainstream C++ implementations are using small string optimization
nowadays.
Post by Marcel Mueller
Furthermore it cannot be initialized from old C
style APIs that require char* buffer and size_t buffer_size.
Cannot really understand this statement. Are you worrying about the
overhead of initializing 10 bytes in a std::string before calling a C
style API? Can you actually measure this overhead?
Post by Marcel Mueller
The idea is to return std::array<char,10> or something like that. This
causes no allocation and the compiler should be able to optimize the
return value to emplace the result into the callers storage.
Seems like a perfect example of premature optimization. If your
application really requires to win these hypothetical nanoseconds, then
you should probably write your own string class tuned to maximum
performance for that particular application.
Marcel Mueller
2024-03-10 16:17:35 UTC
Permalink
Post by Paavo Helde
Post by Marcel Mueller
std::string might require allocation if it has no small string
optimization build in.
All mainstream C++ implementations are using small string optimization
nowadays.
Yes, it seems so.
Several years ago it was not implemented on platform.
Post by Paavo Helde
Post by Marcel Mueller
Furthermore it cannot be initialized from old C style APIs that
require char* buffer and size_t buffer_size.
Cannot really understand this statement. Are you worrying about the
overhead of initializing 10 bytes in a std::string before calling a C
style API? Can you actually measure this overhead?
The other way around. The C-API cannot write to a std::string because
std::string once created with sufficient size can no longer be converted
to char*. So I always need a temporary char array to create std::string.
AFAIR using &str.front() to write the string is not allowed.


Marcel
Paavo Helde
2024-03-10 16:43:01 UTC
Permalink
Post by Marcel Mueller
Post by Paavo Helde
Post by Marcel Mueller
Furthermore it cannot be initialized from old C style APIs that
require char* buffer and size_t buffer_size.
Cannot really understand this statement. Are you worrying about the
overhead of initializing 10 bytes in a std::string before calling a C
style API? Can you actually measure this overhead?
The other way around. The C-API cannot write to a std::string because
std::string once created with sufficient size can no longer be converted
to char*. So I always need a temporary char array to create std::string.
AFAIR using &str.front() to write the string is not allowed.
Your information is out of date for 13 years formally, and more than
that in practice. The C++11 standard added a guarantee that the string
internal buffer is contiguous, and it also added a non-const overload of
data(), so you can write freely in the string via str.data(). One can
even write the terminating zero at str.data()[str.length()] (but no
other value), meaning one can even use strcpy() for writing.
Andrey Tarasevich
2024-03-12 05:57:41 UTC
Permalink
Post by Marcel Mueller
std::string might require allocation if it has no small string
optimization build in.
True. SSO is not required, but virtually all competent implementations
do it.
Post by Marcel Mueller
Furthermore it cannot be initialized from old C
style APIs that require char* buffer and size_t buffer_size.
That is not true.

A non-const version of `std::string::data()` was added in C++17
specifically to support this usage model. (But even before that you
could gain non-const access to its internal buffer through `&str[0]`).

You can pre-`resize()` an `std::string`, pass its `data()` (and `size()`
) to a C function, determine the resultant length based on zero
terminator's location, and then `resize()` it to the new length.

Done.
Post by Marcel Mueller
The idea is to return std::array<char,10> or something like that. This
causes no allocation and the compiler should be able to optimize the
return value to emplace the result into the callers storage.
Is it reasonable to return small strings a std::array to avoid copying?
If you are sure that they will always fit into the pre-determined (at
the compile time) size, then perhaps it might be reasonable. Basically,
the main benefit I can see here is that it allows one to extend SSO-like
behavior beyond the boundary used by the existing `std::string`
implementation. I.e to avoid involving dynamic memory for a greater
range of string lengths.
--
Best regards,
Andrey
Marcel Mueller
2024-03-14 19:42:54 UTC
Permalink
Post by Andrey Tarasevich
Post by Marcel Mueller
Furthermore it cannot be initialized from old C style APIs that
require char* buffer and size_t buffer_size.
That is not true.
A non-const version of `std::string::data()` was added in C++17
specifically to support this usage model. (But even before that you
could gain non-const access to its internal buffer through `&str[0]`).
Thanks, I was not aware of the latter!

The application is restricted to C++11. (I forgot to mention.)
Post by Andrey Tarasevich
Post by Marcel Mueller
Is it reasonable to return small strings a std::array to avoid copying?
If you are sure that they will always fit into the pre-determined (at
the compile time) size, then perhaps it might be reasonable.
Yes, if the C-API has a size parameter, I am quite sure. ;-)


Marcel

Loading...