Discussion:
Slow std integer to a string conversion (to_string/sprintf)
(too old to reply)
JiiPee
2014-11-01 22:02:34 UTC
Permalink
I wonder how C++ standard library has not implemented a very fast
integer-to-string conversion function. They have sprintf, snprintf,
stringstream and to_string.

From this site:

http://zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html

we can see that sprintf is the fastest one of those.
But then I googled and found from this site:

http://stackoverflow.com/questions/4351371/c-performance-challenge-integer-to-stdstring-conversion

a very fast version, done by somebody. I run tests (using a integers
from 0 to 1294967295 by step 60 and converted all of them to strings. I
also used the sixth item from the result: cU += str2[6]; inside the loop
and then printed it at the end so that the compiler would not optimize
things out).

Results: the user made conversion function from the site was about 130
times faster than sprintf (and thus would be about 200 times faster than
to_string ).

How is it possible there is no fast std version? This is sometimes an
important function, for example when parsing a very large text and
checking integers from it.

Not really complaining but want to start a discussion about this: why
std sometimes does not make fast functions/classes?

For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody wants
my fast function please ask, I can give it then. Also, if somebody knows
faster way to convert, am ready to test it against this! I have a test
platform ready.
Luca Risolia
2014-11-02 00:55:30 UTC
Permalink
Post by JiiPee
For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody wants
my fast function please ask, I can give it then. Also, if somebody knows
faster way to convert, am ready to test it against this! I have a test
platform ready.
And how can we reproduce your tests? Please post the full, compilable
test program together with all the informations about the compiler,
compiler version and compilation flags that you used.
JiiPee
2014-11-02 01:19:26 UTC
Permalink
Post by Luca Risolia
Post by JiiPee
For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody wants
my fast function please ask, I can give it then. Also, if somebody knows
faster way to convert, am ready to test it against this! I have a test
platform ready.
And how can we reproduce your tests? Please post the full, compilable
test program together with all the informations about the compiler,
compiler version and compilation flags that you used.
I also tested intToStr that it converts correctly: first 20 million is
correct. Needs to test all integers, takes 2 hours.... gonna do sometimes.

compiler: GCC 4.8.1, use C++11, -O3, -s

Here you go:

#include <iostream>
#include <ctime>

inline void intToStr(unsigned val, char* c)
{
int size;
if(val>=10000)
{
if(val>=10000000)
{
if(val>=1000000000)
size=10;
else if(val>=100000000)
size=9;
else
size=8;
}
else
{
if(val>=1000000)
size=7;
else if(val>=100000)
size=6;
else
size=5;
}
}
else
{
if(val>=100)
{
if(val>=1000)
size=4;
else
size=3;
}
else
{
if(val>=10)
size=2;
else if (val==0) {
c[0]='0';
c[1] = '\0';
return;
}
else
size=1;
}
}

c += size-1;
while(val>=100)
{
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
}
while(val>0)
{
*c--='0' + (val % 10);
val /= 10;
}
c[size+1] = '\0';
}

// speed test function:

// intToStr is 132 faster than snprintf
// loop 1294967295 step 60 (i+=60): intToStr 0.41 s, snprintf 54.4 s
(x132), sprintf 54.2 (x131)
// (max uint=4294967295)
void benchmark() {

clock_t begin = clock();

char str2[16];
char cU;
for(unsigned int i = 0; i < 1294967295; i+=60) {

// HERE COMMENT OUT THE FUNCTION YOU WANT TO TEST:
//intToStr(i, str2);
//snprintf (str2, 15, "%d",i);
sprintf(str2,"%d",i);
//itoa (i,str2,10);

cU += str2[6]; // use the string so that the compiler will not
optimize things out
}

clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

std::cout<<"Time: "<<elapsed_secs<<cU<<std::endl;
}
JiiPee
2014-11-02 01:23:28 UTC
Permalink
obviously below that only one conversion function should be without
comments... others should be in comments: like if testing snprintf then:

//intToStr(i, str2);
snprintf (str2, 15, "%d",i);
//sprintf(str2,"%d",i);
//itoa (i,str2,10);
Paavo Helde
2014-11-02 08:08:18 UTC
Permalink
Post by JiiPee
Post by Luca Risolia
Post by JiiPee
For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody
wants my fast function please ask, I can give it then. Also, if
somebody knows faster way to convert, am ready to test it against
this! I have a test platform ready.
And how can we reproduce your tests? Please post the full, compilable
test program together with all the informations about the compiler,
compiler version and compilation flags that you used.
I also tested intToStr that it converts correctly: first 20 million is
correct. Needs to test all integers, takes 2 hours.... gonna do sometimes.
compiler: GCC 4.8.1, use C++11, -O3, -s
It appears that the crux of the fast algorithm is to have some static
lookup buffer (digit_pairs) which encodes the ASCII representation of
numbers 0..99. (You forgot to include that, copied it from the
stackoverflow page). So, it trades some memory for speed. So far, so
good.

For fast operation it is essential the digit_pairs array is in the cache.
In your test, this is most probably the case, but in real code the int-
to-string conversions probably happen much less often, so the performance
would not be so great in the real code. Standard library implementations
probably operate only with data in registers so the cache access is not
an issue.

Another point is that in C++ you want to wrap the result in a std::string
or something, and that will also slow things down. In my test runs
wrapping time was in the same range than the conversion itself and if
std::string happens to not use SSO then I guess it may be even much
slower.

So, it is not really clear how much would the win be in real programs.
Anyway, here are my test results for running your test program (MSVC2012,
Release x64 mode):

Using _snprintf, no wrapping: 4.622 s
Using intToStr, no wrapping: 1.143 s (4.0 times faster)

Using _snprintf, wrapping to a std::string: 6.622 s
Using boost::lexical_cast<std::string>: 8.75 s
Using intToStr, wrapping to a std::string: 2.996 s (2.2 times faster than
_snprintf and 2.9 times faster than boost::lexical_cast).

So, here the differences are around 2-3 times and probably smaller in the
real code when the lookup array is out of the cache. Not sure if this
justifies optimizing something which is already quite fast anyway.

OTOH, if there are indeed differences like 130 times on some platform,
then I guess the standard library implementation must be quite bad
indeed.

Cheers
Paavo
Post by JiiPee
#include <iostream>
#include <ctime>
inline void intToStr(unsigned val, char* c)
{
int size;
if(val>=10000)
{
if(val>=10000000)
{
if(val>=1000000000)
size=10;
else if(val>=100000000)
size=9;
else
size=8;
}
else
{
if(val>=1000000)
size=7;
else if(val>=100000)
size=6;
else
size=5;
}
}
else
{
if(val>=100)
{
if(val>=1000)
size=4;
else
size=3;
}
else
{
if(val>=10)
size=2;
else if (val==0) {
c[0]='0';
c[1] = '\0';
return;
}
else
size=1;
}
}
c += size-1;
while(val>=100)
{
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
}
while(val>0)
{
*c--='0' + (val % 10);
val /= 10;
}
c[size+1] = '\0';
}
// intToStr is 132 faster than snprintf
// loop 1294967295 step 60 (i+=60): intToStr 0.41 s, snprintf 54.4 s
(x132), sprintf 54.2 (x131)
// (max uint=4294967295)
void benchmark() {
clock_t begin = clock();
char str2[16];
char cU;
for(unsigned int i = 0; i < 1294967295; i+=60) {
//intToStr(i, str2);
//snprintf (str2, 15, "%d",i);
sprintf(str2,"%d",i);
//itoa (i,str2,10);
cU += str2[6]; // use the string so that the compiler will not
optimize things out
}
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
std::cout<<"Time: "<<elapsed_secs<<cU<<std::endl;
}
JiiPee
2014-11-02 10:09:44 UTC
Permalink
hmmm, interesting. How is it possible you have only 3 times faster and I
have here constantly 130 times faster? Any idea? Did you try also with GCC?
Paavo Helde
2014-11-02 13:11:39 UTC
Permalink
Post by JiiPee
hmmm, interesting. How is it possible you have only 3 times faster and
I have here constantly 130 times faster? Any idea? Did you try also
with GCC?
Gcc/Linux results:

snprintf: 2.85 s
intToStr: 0.4 s (7.1 times faster)

snprintf + std::string wrap: 4.37 s
boost::lexical_cast to std::string: 2.44 s
intToStr + std::string wrap: 1.67 s

Command-line options: g++ -O3 -Wall
Post by JiiPee
g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.6/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --
mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --
enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-
checking=release --with-gxx-include-dir=/usr/include/c++/4.6 --enable-ssp
--disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/
--with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --
with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-
libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-
specific-runtime-libs --program-suffix=-4.6 --enable-linux-futex --
without-system-libunwind --with-arch-32=i586 --with-tune=generic --
build=x86_64-suse-linux
Thread model: posix
gcc version 4.6.2 (SUSE Linux)
JiiPee
2014-11-02 14:25:41 UTC
Permalink
you have the same intToStr time as I have, but I have like 40 secs for
snprintf. But I have 32 bit computer, so maybe thats the reason.

I ran it again, it took more than 50 secs with snprintf. Dont know....
Post by Paavo Helde
Post by JiiPee
hmmm, interesting. How is it possible you have only 3 times faster and
I have here constantly 130 times faster? Any idea? Did you try also
with GCC?
snprintf: 2.85 s
intToStr: 0.4 s (7.1 times faster)
snprintf + std::string wrap: 4.37 s
boost::lexical_cast to std::string: 2.44 s
intToStr + std::string wrap: 1.67 s
Command-line options: g++ -O3 -Wall
Post by JiiPee
g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.6/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --
mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --
enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-
checking=release --with-gxx-include-dir=/usr/include/c++/4.6 --enable-ssp
--disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/
--with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --
with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-
libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-
specific-runtime-libs --program-suffix=-4.6 --enable-linux-futex --
without-system-libunwind --with-arch-32=i586 --with-tune=generic --
build=x86_64-suse-linux
Thread model: posix
gcc version 4.6.2 (SUSE Linux)
Öö Tiib
2014-11-02 15:03:13 UTC
Permalink
Post by JiiPee
you have the same intToStr time as I have, but I have like 40 secs for
snprintf. But I have 32 bit computer, so maybe thats the reason.
No. You likely did something wrong and possibly already discovered what.
Or why else you avoid posting command lines, compiler versions and
system confs? Test carefully. Otherwise you just spread groundless FUD.
Paavo Helde
2014-11-03 06:57:32 UTC
Permalink
Post by JiiPee
you have the same intToStr time as I have, but I have like 40 secs
for snprintf. But I have 32 bit computer, so maybe thats the reason.
I ran it again, it took more than 50 secs with snprintf. Dont know....
One possibility for 10-time slowdown might be that you have some weird libc
debugging options set up, or some weird locale settings. Check your
environment variables. And of course you can try to profile the program
with e.g. valgrind and see where it spends time.

Cheers
Paavo
JiiPee
2014-11-03 07:28:53 UTC
Permalink
Post by Paavo Helde
Post by JiiPee
you have the same intToStr time as I have, but I have like 40 secs
for snprintf. But I have 32 bit computer, so maybe thats the reason.
I ran it again, it took more than 50 secs with snprintf. Dont know....
One possibility for 10-time slowdown might be that you have some weird libc
debugging options set up, or some weird locale settings. Check your
environment variables. And of course you can try to profile the program
with e.g. valgrind and see where it spends time.
Cheers
Paavo
its a release build, only O3 and s flags on.
But I guess I need to test it next on Visual Studio to see if i get the
same there...
Ben Bacarisse
2014-11-02 14:48:18 UTC
Permalink
Paavo Helde <***@osa.pri.ee> writes:
<snip>
Post by Paavo Helde
Post by JiiPee
Post by JiiPee
For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody
wants my fast function please ask, I can give it then. Also, if
somebody knows faster way to convert, am ready to test it against
this! I have a test platform ready.
<snip>
Post by Paavo Helde
It appears that the crux of the fast algorithm is to have some static
lookup buffer (digit_pairs) which encodes the ASCII representation of
numbers 0..99.
Worth pointing out that the code that copies the pairs is not portable
for a number of reasons and, in those cases where it works, the
performance may depend on the buffer alignment. I changed the code to
copy the two bytes portably, and saw no significant penalty, at least in
optimised code.
Post by Paavo Helde
Post by JiiPee
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
<snip>
--
Ben.
JiiPee
2014-11-02 16:03:00 UTC
Permalink
Post by Ben Bacarisse
<snip>
Post by Paavo Helde
Post by JiiPee
Post by JiiPee
For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody
wants my fast function please ask, I can give it then. Also, if
somebody knows faster way to convert, am ready to test it against
this! I have a test platform ready.
<snip>
Post by Paavo Helde
It appears that the crux of the fast algorithm is to have some static
lookup buffer (digit_pairs) which encodes the ASCII representation of
numbers 0..99.
Worth pointing out that the code that copies the pairs is not portable
for a number of reasons and, in those cases where it works, the
performance may depend on the buffer alignment. I changed the code to
copy the two bytes portably, and saw no significant penalty, at least in
optimised code.
Post by Paavo Helde
Post by JiiPee
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
<snip>
Can you show me exatcly what you changed so I can test that version here?
Paavo Helde
2014-11-02 16:21:19 UTC
Permalink
Post by JiiPee
Post by Ben Bacarisse
Post by JiiPee
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
<snip>
Can you show me exatcly what you changed so I can test that version here?
Probably something like
// *(short*)(c-1)=*(short*)(digit_pairs+2*pos);

*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
JiiPee
2014-11-02 16:34:21 UTC
Permalink
Post by Paavo Helde
*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
that does not make intToStr any slower
Paavo Helde
2014-11-02 17:09:19 UTC
Permalink
Post by JiiPee
Post by Paavo Helde
*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
that does not make intToStr any slower
Yes, that's the point. There was no need to mess around with short*
casting.

Cheers
Paavo
JiiPee
2014-11-13 18:05:19 UTC
Permalink
Post by Paavo Helde
Post by JiiPee
Post by Paavo Helde
*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
that does not make intToStr any slower
Yes, that's the point. There was no need to mess around with short*
casting.
Cheers
Paavo
Thanks, I tested the uint version, and no errors.
Chris Vine
2014-11-03 00:36:05 UTC
Permalink
On Sun, 02 Nov 2014 16:34:21 +0000
Post by JiiPee
Post by Paavo Helde
*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
that does not make intToStr any slower
And it is also correct. Yours gave rise to undefined behaviour (and I
am surprised your compiler did not warn about it). Did you have -Wall
set?

Chris
JiiPee
2014-11-03 00:48:19 UTC
Permalink
Post by Chris Vine
On Sun, 02 Nov 2014 16:34:21 +0000
Post by JiiPee
Post by Paavo Helde
*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
that does not make intToStr any slower
And it is also correct. Yours gave rise to undefined behaviour (and I
am surprised your compiler did not warn about it). Did you have -Wall
set?
Chris
Am lazy thinking... can you inform me what is the undefined behaviour here?
And yes I used Wall, and did not get warning on that line.
Chris Vine
2014-11-03 01:30:01 UTC
Permalink
On Mon, 03 Nov 2014 00:48:19 +0000
Post by JiiPee
Post by Chris Vine
On Sun, 02 Nov 2014 16:34:21 +0000
Post by JiiPee
Post by Paavo Helde
*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
that does not make intToStr any slower
And it is also correct. Yours gave rise to undefined behaviour
(and I am surprised your compiler did not warn about it). Did you
have -Wall set?
Chris
Am lazy thinking... can you inform me what is the undefined behaviour
here? And yes I used Wall, and did not get warning on that line.
Your version gave undefined behaviour because you cannot alias an object
whose dynamic type is a char by dereferencing a pointer to short, in
any version of C or C++. (You can however do the reverse.) So first,
the compiler can simply elide code doing this, and gcc-4.8 and 4.9 is
reputed to do that when provoked with a sufficiently high optimisation
level. Secondly, you may find that object alignments do not match (but
you should be fine on x86).

Chris
JiiPee
2014-11-03 01:40:07 UTC
Permalink
Post by Chris Vine
On Mon, 03 Nov 2014 00:48:19 +0000
Post by JiiPee
Post by Chris Vine
On Sun, 02 Nov 2014 16:34:21 +0000
Post by JiiPee
Post by Paavo Helde
*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
that does not make intToStr any slower
And it is also correct. Yours gave rise to undefined behaviour
(and I am surprised your compiler did not warn about it). Did you
have -Wall set?
Chris
Am lazy thinking... can you inform me what is the undefined behaviour
here? And yes I used Wall, and did not get warning on that line.
Your version gave undefined behaviour because you cannot alias an object
whose dynamic type is a char by dereferencing a pointer to short, in
any version of C or C++. (You can however do the reverse.) So first,
the compiler can simply elide code doing this, and gcc-4.8 and 4.9 is
reputed to do that when provoked with a sufficiently high optimisation
level. Secondly, you may find that object alignments do not match (but
you should be fine on x86).
Chris
yes, I start understanding now... just checked the code and I see now
what is happening there. Ye, maybe it needs to be changed here.... but
otherwise the code seems to be ok.

Good observation.. i ll have to change the code.

Also I can see that we can make the code even faster by doubling the
lookup table say to see 0-9999 numbers, which would take up memory 40000
char s. That could be a compiling flag maybe to get super fast code with
bigger exe file.
Paavo Helde
2014-11-03 06:04:36 UTC
Permalink
Post by JiiPee
Also I can see that we can make the code even faster by doubling the
lookup table say to see 0-9999 numbers, which would take up memory 40000
char s. That could be a compiling flag maybe to get super fast code with
bigger exe file.
Don't be so sure it will be faster, memory access is often the bottleneck
nowadays. It might even happen that the whole program slows down because
IntToStr() is pushing some useful data out of the caches.

To be honest, all this thread seems like an ultimate exercise in premature
optimization. I am pretty sure there will not be many programs where int-
to-string conversion would show up in the profiler. Trading 40000 bytes of
process size to (potentially) speed it up seems dubious, I know there are
people who are whining already about 100 bytes.

Cheers
Paavo
JiiPee
2014-11-03 07:15:10 UTC
Permalink
Post by Paavo Helde
Post by JiiPee
Also I can see that we can make the code even faster by doubling the
lookup table say to see 0-9999 numbers, which would take up memory 40000
char s. That could be a compiling flag maybe to get super fast code with
bigger exe file.
Don't be so sure it will be faster, memory access is often the bottleneck
nowadays. It might even happen that the whole program slows down because
IntToStr() is pushing some useful data out of the caches.
To be honest, all this thread seems like an ultimate exercise in premature
optimization. I am pretty sure there will not be many programs where int-
to-string conversion would show up in the profiler. Trading 40000 bytes of
process size to (potentially) speed it up seems dubious, I know there are
people who are whining already about 100 bytes.
Cheers
Paavo
I know, but thats why I said it should be under a compile flag, so only
in extreme situations where a big loop is used and there is nothing else
running in the program.
Öö Tiib
2014-11-03 07:48:00 UTC
Permalink
Post by JiiPee
Post by Paavo Helde
Post by JiiPee
Also I can see that we can make the code even faster by doubling the
lookup table say to see 0-9999 numbers, which would take up memory
40000 char s. That could be a compiling flag maybe to get super fast
code with bigger exe file.
Don't be so sure it will be faster, memory access is often the
bottleneck nowadays. It might even happen that the whole program
slows down because IntToStr() is pushing some useful data out of
the caches.
To be honest, all this thread seems like an ultimate exercise in
premature optimization. I am pretty sure there will not be many
programs where int-to-string conversion would show up in the
profiler. Trading 40000 bytes of process size to (potentially)
speed it up seems dubious, I know there are people who are whining
already about 100 bytes.
I know, but thats why I said it should be under a compile flag, so
only in extreme situations where a big loop is used and there is
nothing else running in the program.
Can you picture such real "extreme" situation? I can. Every time
when such text processing shows up in profiler then there is some
sort of needless text communication ... for example JSON layer
between two intensely communicating modules. Even there the
bottle-neck is not usually serialization of ints but dynamic memory management of the little texts. Removing the whole pointless
monkey business text exchange layer itself will always achieve
far better performance gains than optimizing the text composers
and parsers. Using up most of the cache however (40000 is most of
64k L1) will certainly slow it down even more.
Paavo Helde
2014-11-03 08:16:06 UTC
Permalink
Post by JiiPee
Post by Paavo Helde
To be honest, all this thread seems like an ultimate exercise in
premature optimization. I am pretty sure there will not be many
programs where int-
to-string conversion would show up in the profiler. Trading 40000
bytes of process size to (potentially) speed it up seems dubious, I
know there are people who are whining already about 100 bytes.
I know, but thats why I said it should be under a compile flag, so
only in extreme situations where a big loop is used and there is
nothing else running in the program.
This has not much to do with the compiler, sprintf is a C standard
library function. The C standard library is meant as a general-purpose
library making a decent choice of different tradeoffs (like speed vs
memory usage vs development and maintenance efforts, etc) so that it is
good enough for most usage cases. As a result of tradeoffs it cannot be
the best choice in all situations.

In particular, sprintf() is defined as a locale-dependent function, with
all the legacy and drawbacks associated with it. It is also defined as a
function driven by a separate minilanguage and doing the runtime
interpretation of that. These two facts already basically exclude the
possibility that it could be the fastest tool around, so if some program
requires maximum performance in this area, it cannot use sprintf anyway,
regardless of whether it can be finetuned by some compiler switches or
not.

In special corner cases like your hypothetical program doing only
zillions of int-string conversions one may need to use special custom
functions or libraries. You know, there are lots of special purpose
libraries in C and C++. This special IntToStr() function looks like a
good candidate for such a library.

All this does not mean that the standard C library version of sprintf()
could not be made better. With glibc everybody can actually do that in
principle, it's an open-source collaborative project. The toughest part
would probably be reaching an agreement about what is "better".

Cheers
Paavo
JiiPee
2014-11-13 18:12:29 UTC
Permalink
Post by Paavo Helde
Post by JiiPee
Post by Paavo Helde
To be honest, all this thread seems like an ultimate exercise in
premature optimization. I am pretty sure there will not be many
programs where int-
to-string conversion would show up in the profiler. Trading 40000
bytes of process size to (potentially) speed it up seems dubious, I
know there are people who are whining already about 100 bytes.
I know, but thats why I said it should be under a compile flag, so
only in extreme situations where a big loop is used and there is
nothing else running in the program.
This has not much to do with the compiler, sprintf is a C standard
library function. The C standard library is meant as a general-purpose
library making a decent choice of different tradeoffs (like speed vs
memory usage vs development and maintenance efforts, etc) so that it is
good enough for most usage cases. As a result of tradeoffs it cannot be
the best choice in all situations.
In particular, sprintf() is defined as a locale-dependent function, with
all the legacy and drawbacks associated with it.
Ok, but am not talking about sprintf here, am talking about our IntToStr
function. If I was to do this library (which am actually doing I think..
I have my own "ctools" library where I add all kind of beneficial
functions/tools I use in my projects.. then just add this file to my
projects... so gonna add this one there as well)., then I would put a
compile time flag, maybe a template, to tell what kind of version I want
the super fast or fast.
JiiPee
2014-11-13 18:21:14 UTC
Permalink
Post by Paavo Helde
Post by JiiPee
Also I can see that we can make the code even faster by doubling the
lookup table say to see 0-9999 numbers, which would take up memory 40000
char s. That could be a compiling flag maybe to get super fast code with
bigger exe file.
Don't be so sure it will be faster, memory access is often the bottleneck
nowadays. It might even happen that the whole program slows down because
IntToStr() is pushing some useful data out of the caches.
Would be interesting to test your "theory"... I would like to do it. But
I would need an example code from you where you think it would be
slower. We could test this.

But the 40000 trade off - its only allocated once in a program because
its a static array. and for me 40000 is not much memory waste, as I do
normal pc program mostly. its like 0.001 % of the RAM memory if having 3
GIG memory.
Öö Tiib
2014-11-14 01:02:46 UTC
Permalink
Post by JiiPee
But the 40000 trade off - its only allocated once in a program because
its a static array. and for me 40000 is not much memory waste, as I do
normal pc program mostly. its like 0.001 % of the RAM memory if having 3
GIG memory.
Memory costs nothing. Gigabyte here or there. It is full L1 cache that
costs something and L1 is 64k. Platform architects say that 64k is
optimal middle ground between too lot of transistors and too few L1
cache. Your tests (if they only work with algorithms that use fully the
64k L1 and 40k is almost full) will lie to you! In real application no
algorithm can use all L1 as its private storage of nonsense sub-strings.
Dombo
2014-11-16 15:04:51 UTC
Permalink
Post by Öö Tiib
Post by JiiPee
But the 40000 trade off - its only allocated once in a program because
its a static array. and for me 40000 is not much memory waste, as I do
normal pc program mostly. its like 0.001 % of the RAM memory if having 3
GIG memory.
Memory costs nothing. Gigabyte here or there. It is full L1 cache that
costs something and L1 is 64k. Platform architects say that 64k is
optimal middle ground between too lot of transistors and too few L1
cache.
The problem is not so much the number of transistors needed for the
cache but the trade off between cache size and cache latency. The larger
the cache the larger the latency. That is the reason why CPU have
multi-level caches; a small but low latency L1 cache and a larger but
slower L2 cache and nowadays typically an even larger and slower L3
cache. If it were just a matter of transistor count there would be
little point in integrating multiple cache levels on the same die.

Considering physics I have little hope that the size of L1 caches will
grow significantly in the near future. For example the AMD Athlon (K7)
introduced back in 1999 had a L1 cache of 64 kB + 64 kB (Data +
Instructions), whereas the Intel Core i7 has a L1 cache of 32 kB + 32 kB
(Data + Instructions). Since memory accesses can performance wise
potentially be very expensive, it may make more sense to reduce the
amount of memory that needs to be accessed than to reduce the number of
instructions that need to be executed to accomplish a certain task.
Ben Bacarisse
2014-11-02 19:24:13 UTC
Permalink
Post by Paavo Helde
Post by JiiPee
Post by Ben Bacarisse
Post by JiiPee
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
<snip>
Can you show me exatcly what you changed so I can test that version here?
Probably something like
// *(short*)(c-1)=*(short*)(digit_pairs+2*pos);
*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];
Yes, something like or maybe even exactly like that!
--
Ben.
JiiPee
2014-11-02 14:22:41 UTC
Permalink
and needs this one as well:

const char digit_pairs[201] = {
"00010203040506070809"
"10111213141516171819"
"20212223242526272829"
"30313233343536373839"
"40414243444546474849"
"50515253545556575859"
"60616263646566676869"
"70717273747576777879"
"80818283848586878889"
"90919293949596979899"
};
Post by JiiPee
Post by Luca Risolia
Post by JiiPee
For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody wants
my fast function please ask, I can give it then. Also, if somebody knows
faster way to convert, am ready to test it against this! I have a test
platform ready.
And how can we reproduce your tests? Please post the full, compilable
test program together with all the informations about the compiler,
compiler version and compilation flags that you used.
I also tested intToStr that it converts correctly: first 20 million is
correct. Needs to test all integers, takes 2 hours.... gonna do sometimes.
compiler: GCC 4.8.1, use C++11, -O3, -s
#include <iostream>
#include <ctime>
inline void intToStr(unsigned val, char* c)
{
int size;
if(val>=10000)
{
if(val>=10000000)
{
if(val>=1000000000)
size=10;
else if(val>=100000000)
size=9;
else
size=8;
}
else
{
if(val>=1000000)
size=7;
else if(val>=100000)
size=6;
else
size=5;
}
}
else
{
if(val>=100)
{
if(val>=1000)
size=4;
else
size=3;
}
else
{
if(val>=10)
size=2;
else if (val==0) {
c[0]='0';
c[1] = '\0';
return;
}
else
size=1;
}
}
c += size-1;
while(val>=100)
{
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
}
while(val>0)
{
*c--='0' + (val % 10);
val /= 10;
}
c[size+1] = '\0';
}
// intToStr is 132 faster than snprintf
// loop 1294967295 step 60 (i+=60): intToStr 0.41 s, snprintf 54.4 s
(x132), sprintf 54.2 (x131)
// (max uint=4294967295)
void benchmark() {
clock_t begin = clock();
char str2[16];
char cU;
for(unsigned int i = 0; i < 1294967295; i+=60) {
//intToStr(i, str2);
//snprintf (str2, 15, "%d",i);
sprintf(str2,"%d",i);
//itoa (i,str2,10);
cU += str2[6]; // use the string so that the compiler will not
optimize things out
}
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
std::cout<<"Time: "<<elapsed_secs<<cU<<std::endl;
}
Luca Risolia
2014-11-03 16:19:07 UTC
Permalink
Post by JiiPee
Post by Luca Risolia
Post by JiiPee
For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody wants
my fast function please ask, I can give it then. Also, if somebody knows
faster way to convert, am ready to test it against this! I have a test
platform ready.
And how can we reproduce your tests? Please post the full, compilable
test program together with all the informations about the compiler,
compiler version and compilation flags that you used.
I also tested intToStr that it converts correctly: first 20 million is
correct. Needs to test all integers, takes 2 hours.... gonna do sometimes.
compiler: GCC 4.8.1, use C++11, -O3, -s
#include <iostream>
#include <ctime>
inline void intToStr(unsigned val, char* c)
This should be ~4x faster, just for fun...

#include <cstdint>
#include <limits>

template <class T>
constexpr T pow(T const& x, std::size_t n) {
return n > 0 ? x * pow(x, n - 1) : 1;
}

constexpr auto dgts = std::numeric_limits<unsigned int>::digits10;
constexpr auto max = pow(10u, dgts);

template<int> struct S;

template<>
struct S<0> {
template<int>
static inline void f(unsigned int, char*) noexcept {}
};

template<int P>
struct S {
template<int Q>
static inline void f(unsigned int i, char* c) noexcept {
constexpr auto a = pow(10u, Q + 1 - P), b = a / 10;
S < P - 1 > ::template f<Q>(i, c);
c[P - 1] = '0' + (i % a) / b;
}
};

template<int> void g(unsigned int, char*) noexcept;

template<> inline void g<dgts + 1>(unsigned int i, char* c) noexcept {
if (i < max) {
S<dgts>::f<dgts>(i, c);
c[dgts] = '\0';
} else {
c[0] = '0' + (i / max);
S<dgts >::f < dgts > (i, c + 1);
c[dgts + 1] = '\0';
}
}

template<int X> inline void g(unsigned int i, char* c) noexcept {
constexpr unsigned int Y = pow(10u, X);
if (i < Y) {
S<X>::template f< X > (i, c);
c[X] = '\0';
} else
g < X + 1 > (i, c);
}

inline void uint_to_str(unsigned int i, char* c) noexcept {
g<1>(i, c);
}

int main() {
// example
char str[16];
uint_to_str(100, str);
}
Martijn Lievaart
2014-11-03 16:47:16 UTC
Permalink
Post by Luca Risolia
template<int X> inline void g(unsigned int i, char* c) noexcept {
constexpr unsigned int Y = pow(10u, X);
if (i < Y) {
S<X>::template f< X > (i, c);
c[X] = '\0';
} else
g < X + 1 > (i, c);
}
What if i is slightly smaller than std::numeric_limits<unsigned>::max()?
In that case Y would eventually not fit an unsigned, resulting in an
infinite loop.

M4
Luca Risolia
2014-11-04 00:27:22 UTC
Permalink
Post by Martijn Lievaart
Post by Luca Risolia
template<int X> inline void g(unsigned int i, char* c) noexcept {
constexpr unsigned int Y = pow(10u, X);
if (i < Y) {
S<X>::template f< X > (i, c);
c[X] = '\0';
} else
g < X + 1 > (i, c);
}
What if i is slightly smaller than std::numeric_limits<unsigned>::max()?
In that case Y would eventually not fit an unsigned, resulting in an
infinite loop.
I think that case is not possible. X can vary from 1 to
std::numeric_limits<unsigned>::digits10 (included), which implies that
the greatest Y is Y0 = 10^std::numeric_limits<unsigned>::digits10, where
std::numeric_limits<unsigned>::digits10 is one less than the number of
decimal digits needed to represent the maximum unsigned value
std::numeric_limits<unsigned>::max(). For i >= Y0, there is the
g<std::numeric_limits<unsigned>::digits10 + 1>() template
specialization, but no Y is calculated there.
Martijn Lievaart
2014-11-04 07:59:46 UTC
Permalink
Post by Luca Risolia
Post by Martijn Lievaart
Post by Luca Risolia
template<int X> inline void g(unsigned int i, char* c) noexcept {
constexpr unsigned int Y = pow(10u, X);
if (i < Y) {
S<X>::template f< X > (i, c);
c[X] = '\0';
} else
g < X + 1 > (i, c);
}
What if i is slightly smaller than
std::numeric_limits<unsigned>::max()?
In that case Y would eventually not fit an unsigned, resulting in an
infinite loop.
I think that case is not possible. X can vary from 1 to
std::numeric_limits<unsigned>::digits10 (included), which implies that
the greatest Y is Y0 = 10^std::numeric_limits<unsigned>::digits10, where
std::numeric_limits<unsigned>::digits10 is one less than the number of
decimal digits needed to represent the maximum unsigned value
std::numeric_limits<unsigned>::max(). For i >= Y0, there is the
g<std::numeric_limits<unsigned>::digits10 + 1>() template
specialization, but no Y is calculated there.
Buzz. Think again. I see at least two errors in this explanation.

M4
Luca Risolia
2014-11-04 09:44:33 UTC
Permalink
Post by Martijn Lievaart
Post by Luca Risolia
Post by Martijn Lievaart
Post by Luca Risolia
template<int X> inline void g(unsigned int i, char* c) noexcept {
constexpr unsigned int Y = pow(10u, X);
if (i < Y) {
S<X>::template f< X > (i, c);
c[X] = '\0';
} else
g < X + 1 > (i, c);
}
What if i is slightly smaller than
std::numeric_limits<unsigned>::max()?
In that case Y would eventually not fit an unsigned, resulting in an
infinite loop.
I think that case is not possible. X can vary from 1 to
std::numeric_limits<unsigned>::digits10 (included), which implies that
the greatest Y is Y0 = 10^std::numeric_limits<unsigned>::digits10, where
std::numeric_limits<unsigned>::digits10 is one less than the number of
decimal digits needed to represent the maximum unsigned value
std::numeric_limits<unsigned>::max(). For i >= Y0, there is the
g<std::numeric_limits<unsigned>::digits10 + 1>() template
specialization, but no Y is calculated there.
Buzz. Think again. I see at least two errors in this explanation.
Please be clear, define a way to reproduce the case where the algorithm
would "eventually" result in an infinite loop in your opinion.
Unfortunately, I cannot spend much time on this topic which was initially
supposed to take me no more than 5 minutes.
Martijn Lievaart
2014-11-04 10:51:10 UTC
Permalink
Post by Luca Risolia
Post by Martijn Lievaart
Post by Luca Risolia
Post by Martijn Lievaart
Post by Luca Risolia
template<int X> inline void g(unsigned int i, char* c) noexcept {
constexpr unsigned int Y = pow(10u, X);
if (i < Y) {
S<X>::template f< X > (i, c);
c[X] = '\0';
} else
g < X + 1 > (i, c);
}
What if i is slightly smaller than
std::numeric_limits<unsigned>::max()?
In that case Y would eventually not fit an unsigned, resulting in an
infinite loop.
I think that case is not possible. X can vary from 1 to
std::numeric_limits<unsigned>::digits10 (included), which implies that
the greatest Y is Y0 = 10^std::numeric_limits<unsigned>::digits10,
where std::numeric_limits<unsigned>::digits10 is one less than the
number of decimal digits needed to represent the maximum unsigned
value std::numeric_limits<unsigned>::max(). For i >= Y0, there is the
g<std::numeric_limits<unsigned>::digits10 + 1>() template
specialization, but no Y is calculated there.
Buzz. Think again. I see at least two errors in this explanation.
Please be clear, define a way to reproduce the case where the algorithm
would "eventually" result in an infinite loop in your opinion.
Unfortunately, I cannot spend much time on this topic which was
initially supposed to take me no more than 5 minutes.
If X equals std::numeric_limits<unsigned>::digits10, Y cannot be
represented in an unsigned, so the result is implementation defined, but
at most std::numeric_limits<unsigned>::max. So if i is
std::numeric_limits<unsigned>::max, it will never be smaller than Y,
resulting in unbounded recursion. For any i >= pow(10u, Y) where Y ==
std::numeric_limits<unsigned>::digits10, either the result is wrong, or
unbounded recursion occurs.

To make it simple, assume 8 bits. std::numeric_limits<unsigned>::digits10
== 3 => Y should be 1000, but that cannot be represented in 8 bits. (I
know an unsigned cannot be 8 bits, that is not the point).

M4
Luca Risolia
2014-11-04 11:11:41 UTC
Permalink
Post by Martijn Lievaart
Post by Luca Risolia
Post by Martijn Lievaart
Post by Luca Risolia
I think that case is not possible. X can vary from 1 to
std::numeric_limits<unsigned>::digits10 (included), which implies that
the greatest Y is Y0 = 10^std::numeric_limits<unsigned>::digits10,
where std::numeric_limits<unsigned>::digits10 is one less than the
number of decimal digits needed to represent the maximum unsigned
value std::numeric_limits<unsigned>::max(). For i >= Y0, there is the
g<std::numeric_limits<unsigned>::digits10 + 1>() template
specialization, but no Y is calculated there.
Buzz. Think again. I see at least two errors in this explanation.
Please be clear, define a way to reproduce the case where the algorithm
would "eventually" result in an infinite loop in your opinion.
Unfortunately, I cannot spend much time on this topic which was
initially supposed to take me no more than 5 minutes.
To make it simple, assume 8 bits. std::numeric_limits<unsigned>::digits10
== 3 => Y should be 1000, but that cannot be represented in 8 bits. (I
know an unsigned cannot be 8 bits, that is not the point).
No. Again, this case is not possible. By standard an 8-bit unsigned type
cannot have std::numeric_limits<unsigned>::digits10 == 3. A concrete example
on my C++ implementation is unsigned char, for which CHAR_BIT is 8 and
std::numeric_limits<unsigned char> is 2, as it should be.
Martijn Lievaart
2014-11-05 22:33:40 UTC
Permalink
Post by Luca Risolia
Post by Martijn Lievaart
Post by Luca Risolia
Post by Martijn Lievaart
Post by Luca Risolia
I think that case is not possible. X can vary from 1 to
std::numeric_limits<unsigned>::digits10 (included), which implies
that the greatest Y is Y0 =
10^std::numeric_limits<unsigned>::digits10, where
std::numeric_limits<unsigned>::digits10 is one less than the number
of decimal digits needed to represent the maximum unsigned value
std::numeric_limits<unsigned>::max(). For i >= Y0, there is the
g<std::numeric_limits<unsigned>::digits10 + 1>() template
specialization, but no Y is calculated there.
Buzz. Think again. I see at least two errors in this explanation.
Please be clear, define a way to reproduce the case where the
algorithm would "eventually" result in an infinite loop in your
opinion. Unfortunately, I cannot spend much time on this topic which
was initially supposed to take me no more than 5 minutes.
To make it simple, assume 8 bits.
std::numeric_limits<unsigned>::digits10 == 3 => Y should be 1000, but
that cannot be represented in 8 bits. (I know an unsigned cannot be 8
bits, that is not the point).
No. Again, this case is not possible. By standard an 8-bit unsigned type
cannot have std::numeric_limits<unsigned>::digits10 == 3. A concrete
example on my C++ implementation is unsigned char, for which CHAR_BIT is
8 and std::numeric_limits<unsigned char> is 2, as it should be.
Brainfart on my part. You are right.

M4
Jorgen Grahn
2014-11-02 02:10:24 UTC
Permalink
Post by JiiPee
I wonder how C++ standard library has not implemented a very fast
integer-to-string conversion function. They have sprintf, snprintf,
stringstream and to_string.
http://zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html
we can see that sprintf is the fastest one of those.
http://stackoverflow.com/questions/4351371/c-performance-challenge-integer-to-stdstring-conversion
a very fast version, done by somebody. I run tests (using a integers
from 0 to 1294967295 by step 60 and converted all of them to strings. I
also used the sixth item from the result: cU += str2[6]; inside the loop
and then printed it at the end so that the compiler would not optimize
things out).
That's not a bad setup, but it's even harder than that to make
benchmarks. E.g. how does this implementation affect data and code
caches? Unlike in real code, the caches are probably warm when you
enter the function you're benchmarking.
Post by JiiPee
Results: the user made conversion function from the site was about 130
times faster than sprintf (and thus would be about 200 times faster than
to_string ).
How is it possible there is no fast std version? This is sometimes an
important function, for example when parsing a very large text and
checking integers from it.
Not really complaining but want to start a discussion about this: why
std sometimes does not make fast functions/classes?
The C++ standard has not so much to do with it; it's a problem with
whatever implementation of the standard library you use.

But /that/ is an interesting question: are we losing performance due
to suboptimal standard library implemntations, and is it feasible to
do something about it?

I know, after earlier discussions here, that std::ostream
implementations tend to be slower than C stdio. And I don't see that
they /have/ to be.
Post by JiiPee
For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody wants
my fast function please ask, I can give it then. Also, if somebody knows
faster way to convert, am ready to test it against this! I have a test
platform ready.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Christian Gollwitzer
2014-11-02 08:35:52 UTC
Permalink
Post by Jorgen Grahn
I know, after earlier discussions here, that std::ostream
implementations tend to be slower than C stdio. And I don't see that
they /have/ to be.
For sprintf vs. intToStr (or itoa), I'm also not surprised - sprintf has
to interpret the format string at runtime, unless the compiler has a
special optimization to inline sprintf.

Christian
Jorgen Grahn
2014-11-02 12:06:09 UTC
Permalink
Post by Christian Gollwitzer
Post by Jorgen Grahn
I know, after earlier discussions here, that std::ostream
implementations tend to be slower than C stdio. And I don't see that
they /have/ to be.
For sprintf vs. intToStr (or itoa), I'm also not surprised - sprintf has
to interpret the format string at runtime,
Oh, I'm not surprised by /that/ -- I was only talking about iostreams
above.
Post by Christian Gollwitzer
unless the compiler has a
special optimization to inline sprintf.
I was going to say "sprintf() seems unlikely to be one of those
functions", but gcc apparently has it as a builtin. No idea why
though -- the documentation doesn't say.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Christian Gollwitzer
2014-11-02 17:21:53 UTC
Permalink
Post by Jorgen Grahn
Post by Christian Gollwitzer
Post by Jorgen Grahn
I know, after earlier discussions here, that std::ostream
implementations tend to be slower than C stdio. And I don't see that
they /have/ to be.
For sprintf vs. intToStr (or itoa), I'm also not surprised - sprintf has
to interpret the format string at runtime,
Oh, I'm not surprised by /that/ -- I was only talking about iostreams
above.
Post by Christian Gollwitzer
unless the compiler has a
special optimization to inline sprintf.
I was going to say "sprintf() seems unlikely to be one of those
functions", but gcc apparently has it as a builtin. No idea why
though -- the documentation doesn't say.
GCC can issue warnings when the format string and the actual parameters
mismatch. I don't know if this just onlyissues the warning or gnerates
static calls (which it could). Too lazy to check the assembly right now.

Christian
Jouko Koski
2014-11-02 12:21:01 UTC
Permalink
Post by JiiPee
I wonder how C++ standard library has not implemented a very fast
integer-to-string conversion function. They have sprintf, snprintf,
stringstream and to_string.
Considering what these functions do, they may perform pretty well, actually.
It is so easy to implement a function which simple, fast, and produces
incorrect results. It is so obvious that 1234 should yield "1234" but what
if the result, say, "१२३४" would be preferred?
--
Jouko
JiiPee
2014-11-02 16:41:07 UTC
Permalink
Post by Jouko Koski
Post by JiiPee
I wonder how C++ standard library has not implemented a very fast
integer-to-string conversion function. They have sprintf, snprintf,
stringstream and to_string.
Considering what these functions do, they may perform pretty well,
actually. It is so easy to implement a function which simple, fast,
and produces incorrect results. It is so obvious that 1234 should
yield "1234" but what if the result, say, "१२३४" would be preferred?
If I want to convert 1234 to a string, why would I want it to be "१२३४"?
:) I guess you are talking about other languages.. but even then, we
could limit this proglem to europian 123344 representation.
Jouko Koski
2014-11-02 17:48:09 UTC
Permalink
Post by JiiPee
If I want to convert 1234 to a string, why would I want it to be "१२३४"?
:) I guess you are talking about other languages.. but even then, we could
limit this proglem to europian 123344 representation.
Yes, I mean locales, or even different character encodings used by terminal
devices. Standard library functions generally take these into account while
most "fast" functions do not.

Limiting this to Western number representation sounds as myopic as an
insular Yankee insisting that everybody should be fine with 7-bit ascii
encoding.
--
Jouko
JiiPee
2014-11-02 21:51:57 UTC
Permalink
Post by Jouko Koski
Post by JiiPee
If I want to convert 1234 to a string, why would I want it to be
"१२३४"? :) I guess you are talking about other languages.. but even
then, we could limit this proglem to europian 123344 representation.
Yes, I mean locales, or even different character encodings used by
terminal devices. Standard library functions generally take these into
account while most "fast" functions do not.
Limiting this to Western number representation sounds as myopic as an
insular Yankee insisting that everybody should be fine with 7-bit
ascii encoding.
can put both in code: when normal characters, then use the fast one,
otherwise the std one.
Öö Tiib
2014-11-02 15:20:18 UTC
Permalink
Post by JiiPee
I wonder how C++ standard library has not implemented a very fast
integer-to-string conversion function. They have sprintf, snprintf,
stringstream and to_string.
In my tests all of the conversions ('snprintf', 'boost::lexical_cast',
'std::to_string' etc.) are already blindingly fast. 'boost::format' for
example compares most slow but is still tolerable since nothing needs
to turn hundreds of millions of 'int's into readable sub second.

If the string conversions affect performance then I can bet that the
whole algorithm can be made faster by few orders of magnitude since
something is wrong with the whole nature of it. What you do with
*gigabytes* of *human-readable* ints on your 32 bit computer?
Loading...